×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Facebook Kills Dataset of Crawled Public Profiles

CmdrTaco posted about 4 years ago | from the creepy-crawlies dept.

Social Networks 158

holy_calamity writes "Internet entrepreneur Pete Warden wrote a crawler that collated the public profiles of 210 million Facebook profiles and was set to release an anonymised version to researchers. The pages crawled can be read by any web user, and the robots.txt did not forbid crawling. However, Facebook claimed he had violated its terms of service and threatened legal action. Fearing costs, Warden has now destroyed his dataset. For a snapshot of the insights that data could have allowed, see Warden's post on how the friend networks of the 120 million US users in his data segregated into seven clusters." Of course, if he had it, this means anyone who wants it made their own version of this.

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

158 comments

For an Interesting Exercise in Head Asplosion (4, Interesting)

eldavojohn (898314) | about 4 years ago | (#31688234)

Fearing costs, Warden has now destroyed his dataset.

Couldn't Warden have sent requests to the EFF to provide lawyers so he could fight an evil corporation to use freely publicly available information?

Then Facebook could ask the EFF to protect their user's privacy and information being sold to marketers and corporations (sorry, when you're introduced as "Internet entrepreneur" that means there's profit to be had).

Re:For an Interesting Exercise in Head Asplosion (4, Insightful)

paeanblack (191171) | about 4 years ago | (#31688532)

Couldn't Warden have sent requests to the EFF to provide lawyers so he could fight an evil corporation to use freely publicly available information?

Finding something on the web does not give you the legal authority to publish and redistribute it. Sure, he could have stuck the whole thing on a torrent somewhere, but if he actually wants to do real work and real research with these data, he's got to play by the rules of the real world...the one with the big blue ceiling and a concept called the rule of law.

If you don't like that reality, keep it in mind next time you vote.

Re:For an Interesting Exercise in Head Asplosion (1, Redundant)

truthsearch (249536) | about 4 years ago | (#31688662)

Except Facebook is claiming he violated its terms of service (a contract), not the law.

Re:For an Interesting Exercise in Head Asplosion (3, Informative)

Tobor the Eighth Man (13061) | about 4 years ago | (#31688706)

Not really a meaningful distinction, as contract law is very much an aspect of the law. We can bicker about whether terms of service are enforceable and to what extent, but the reality is that this guy has better things to do than wage a complex and almost certainly protracted legal battle against a corporation.

Re:For an Interesting Exercise in Head Asplosion (1)

tibit (1762298) | about 4 years ago | (#31688792)

How does the sticking power of TOS test out in court? Do facebook's TOS actually mean anything, if all you need to do to access their site is to type in a URL? I mean there isn't even a clickthrough to have them pretend like they care. Yes, I seriously would like to know.

Re:For an Interesting Exercise in Head Asplosion (1)

elnyka (803306) | about 4 years ago | (#31689712)

How does the sticking power of TOS test out in court? Do facebook's TOS actually mean anything, if all you need to do to access their site is to type in a URL? I mean there isn't even a clickthrough to have them pretend like they care. Yes, I seriously would like to know.

Those are excellent questions that need to be resolved either amicably or in a court of law (which is what was going to happen.) The later is expensive. Unless you have some powerful backer$s you can't do it alone (though it begs the question why he didn't contact the EFF in the first place.)

Re:For an Interesting Exercise in Head Asplosion (0)

Anonymous Coward | about 4 years ago | (#31690612)

EULAs and whatnot have been verified by US courts, and you'd better believe Facebook lawyers know exactly which judges agreed and which district to file with.

Re:For an Interesting Exercise in Head Asplosion (0)

Anonymous Coward | about 4 years ago | (#31688934)

What evidence do you have that the guy even agreed to the terms of service? If you put something online you give people permission to view it, and if you don't make them log in and click "I agree" then they're not bound by your terms of service. If you don't block access to it via robots.txt you give bots permission to view it, and the same applies. I'm not saying that you give up all rights to anything you put online; copyright law, trademark law, etc still applies. But what this guy was doing violated neither of those.

By saying he violated their terms of service they are saying that by crawling their site he implicitly agreed to their terms of service. That's like me putting a notice at the end of this comment saying "by reading this comment you agree to my terms of service"--legally preposterous. It's an issue of deep legal pockets vs. a small research team, and Facebook wanting to restrict free versions of their demographic information--because they want to be able to charge money.

Re:For an Interesting Exercise in Head Asplosion (0)

Anonymous Coward | about 4 years ago | (#31689416)

But you don't give them the right to COPY it. He has the right to view it sure. But to make copies? Nope. The stuff is copyrighted. So what he did was not legal.

Re:For an Interesting Exercise in Head Asplosion (1)

gorzek (647352) | about 4 years ago | (#31689682)

Copyright is not absolute. Phone books, for instance, are not copyrighted because they are collections of facts--namely, addresses and phone numbers.

Likewise, he could copy all sorts of factual information about the users on Facebook: their names, contact information, friends, etc. He could likely not get away with copying their photos, status updates, and so forth since those can constitute creative works and are thus copyrighted.

Nevertheless, just because something is online doesn't mean it's automatically copyrighted. Facts themselves are not.

That's most likely why Facebook went after him using the TOS claim rather than a copyright infringement claim.

Re:For an Interesting Exercise in Head Asplosion (2, Informative)

crashumbc (1221174) | about 4 years ago | (#31689458)

unless something has changed, you have to "login" to see anything in Facebook. Even if a page is "public" you can't view it without logging in with your own account.

A crawler may or may not by pass that...

Re:For an Interesting Exercise in Head Asplosion (4, Insightful)

dubbreak (623656) | about 4 years ago | (#31689138)

Not really a meaningful distinction, as contract law is very much an aspect of the law.

If he was using an account I could see there being a contract enforceable (e.g. if you except these terms of service we will give you an account). If he was just crawling publicly viewable facebook pages, then what is the consideration? I'd argue there is none and therefor no contract exists. You aren't forced to login to view many pages and it's not like they even have a click through "I agree" TOS on each publicly viewable page. He broke no laws and there is no enforceable contract.

If facebook doesn't want people crawling pages publicly viewable pages then make them private (loging in required) or at least have a robots.txt that prohibits crawling of those pages.

Re:For an Interesting Exercise in Head Asplosion (1)

Svartalf (2997) | about 4 years ago | (#31689438)

robots.txt requires that a crawling app HONOR said file.

Re:For an Interesting Exercise in Head Asplosion (1, Insightful)

Anonymous Coward | about 4 years ago | (#31689558)

But if robots.txt disallowed crawling, then Facebook would be able to show that their intent was to not allow this type of data access.

Re:For an Interesting Exercise in Head Asplosion (2, Funny)

K. S. Kyosuke (729550) | about 4 years ago | (#31689018)

Except Facebook is claiming he violated its terms of service (a contract), not the law.

To me, this claim seems to be as legitimate as a public library claiming that I read too many books and threatening to sue me.

Re:For an Interesting Exercise in Head Asplosion (1)

roseblood (631824) | about 4 years ago | (#31691042)

Since this is publicly available data and all the guy did was send an automated web browner to go download it does this mean Facebook has threatened him with a lawsuit for doing what every visitor to facebook does already? Granted, he likely did it much faster than any other individual has done. It's just wonky.

Re:For an Interesting Exercise in Head Asplosion (2, Insightful)

Registered Coward v2 (447531) | about 4 years ago | (#31688748)

Couldn't Warden have sent requests to the EFF to provide lawyers so he could fight an evil corporation to use freely publicly available information?

Finding something on the web does not give you the legal authority to publish and redistribute it. Sure, he could have stuck the whole thing on a torrent somewhere, but if he actually wants to do real work and real research with these data, he's got to play by the rules of the real world...the one with the big blue ceiling and a concept called the rule of law.

If you don't like that reality, keep it in mind next time you vote.

I'm not sure what he did was not legal; but the article is pretty clear he doesn't have the resources to fight it in a court and so decided to destroy it. Maybe someone with more money and time may someday decide to fight it and the legality of scrapping information will be clarified by a court.

To me, the real question is how do TOS square with robot files? Given the generally accepted and followed practice of their use; does not forbidding crawling implicitly allow the data to be collected and used as the scrapper sees fit?

If you view the data as facts; then they are not copyrightable and so aggregating them would be permissible; assuming the TOS is not binding if a scrapper follows the robots.txt instructions. If that is the case, I'd guess a lot more robots.txt files would prohibit scrapping.

At any rate, I'd say the real world rules are not real clear here, other than the one that says "avoid picking a legal fight with someone who has a ton more money and lawyers than you."

Personally, I'd be surprised if someone else already has the same data; but rather than publicize it the simply are using it however they see fit.

Re:For an Interesting Exercise in Head Asplosion (1, Insightful)

Anonymous Coward | about 4 years ago | (#31688752)

Or we could do what America did. Violent revolution and genocidal extermination of the existing inhabitants of the lands we wish to own. That works better than voting and is a very, very American thing to do.
 
Rule of law? You are fucking joking right?

Re:For an Interesting Exercise in Head Asplosion (4, Insightful)

geekoid (135745) | about 4 years ago | (#31688820)

Yes, but you can collect data and publish it as such. Scientific data, not data in the computer sense.
He should of kept his mouth shut, compiled the data , and then just submitted it to a number of journal. At that point Facebook needs to go after the journals. Facebook would have a tough time winning. and even if they did when, going after the journals would be bad PR. SO no real win there. There bet bet would be to actually help him after the fact and look at the data to ensure that an "individuals privacy has not been violated"

The data on social networking sites is amazing and could teach us a lot about human nature.

Re:For an Interesting Exercise in Head Asplosion (1, Insightful)

Anonymous Coward | about 4 years ago | (#31689080)

Finding something on the web does not give you the legal authority to publish and redistribute it.

Why not? Copyright?

Copyright law (at least in the US) does not cover data.

Which is probably why Facebook said it was a "contract" violation.

Re:For an Interesting Exercise in Head Asplosion (1)

wprowe (754923) | about 4 years ago | (#31689288)

Technically, any text that an individual writes on Facebook is copyrighted as their own creative work, like it or not. Redistributing it in whole would violate that individual's copyrights on their own text. Anonymous, aggregated statistics probably would not be governed by that. Whether the Facebook TOS could be imposed on anonymous crawling of the site is the real legal question one would have to answer, I guess. Is crawling the site and copying the data comparable to viewing the site? Facebook might argue that it is, and then might argue therefore that their TOS are enforceable.

Re:For an Interesting Exercise in Head Asplosion (2, Informative)

Rantastic (583764) | about 4 years ago | (#31689148)

Finding something on the web does not give you the legal authority to publish and redistribute it.

Nonsense.

Allow me to call your attention to Fair use, a doctrine in United States copyright law that allows limited use of copyrighted material without requiring permission from the rights holders, such as for commentary, criticism, news reporting, research, teaching or scholarship.

Of course, none of that is actually relevant as Facebook is not making a copyright claim. They are claiming he violated their terms of use. I just scanned it and the only seemingly relevant text I can find is

If you collect information from users, you will: obtain their consent, make it clear you (and not Facebook) are the one collecting their information, and post a privacy policy explaining what information you collect and how you will use it.

Re:For an Interesting Exercise in Head Asplosion (0)

Anonymous Coward | about 4 years ago | (#31689218)

I fail to see how any politician can change the color or location of the sky simply by using rhetoric and a pen.

Re:For an Interesting Exercise in Head Asplosion (2, Interesting)

The Moof (859402) | about 4 years ago | (#31689378)

but if he actually wants to do real work and real research with these data, he's got to play by the rules of the real world...

The summary says the crawler simply indexed public information. Why is this relevant? Well, recently, I noticed that Facebook Apps, all of which I have all disabled and blocked via my privacy settings, have started accessing my information again. Naturally, I assumed something got reset and started hunting for the settings again. Until I found this new block of text in all of their privacy settings:

When you visit a Facebook-enhanced application or website, it may access any information you have made visible to Everyone Edit Profile Privacy as well as your publicly available information. This includes your Name, Profile Picture, Gender, Current City, Networks, Friend List, and Pages. The application will request your permission to access any additional information it needs.

So they claim they can't stop people from acquiring and using my 'publicly available' information, because it's open to the public. Then, they turn around and go after this guy for indexing and using the same 'publicly available' information.

It all sounds a little two-faced to me.

Re:For an Interesting Exercise in Head Asplosion (0)

Anonymous Coward | about 4 years ago | (#31689510)

Does facebook actually have a copyright on it's entire dataset or do they just say or yeah, see that copyright symbol, you can't use our data? or do they simply say we own everything on our servers and noone else can use any of the information in any way we don't like?

The first case is just funny. It reminds me of when people would write copyrights at the bottom of their personal webpages and think that they had somehow ownership of the information.

The second case presents a somewhat credible argument. Except it doesn't personally make any sense to me.

In either case, I've always been of the opinion that the internet is like an extension of the old bulletin board systems, which were an electronic extension of actual bulletin boards, which worked under the premise that anything on them was made available for public use. If you put up something on that bbs then its your responsibility to control who you want access to the data. If someone is intelligent/devious enough to break your control then it is still the posters fault for putting something they didn't want public on a public system.

I think the reality of controlling the internet that companies and governments are attempting to do will only be acchieved once they realize that the internet and anything on it is public domain. They can then start to develop proper controls for the data they want to remain hidden from public use.

Re:For an Interesting Exercise in Head Asplosion (1)

blahplusplus (757119) | about 4 years ago | (#31689600)

"he's got to play by the rules of the real world...the one with the big blue ceiling and a concept called the rule of law."

Which are bought and sold by lobbyists. The law is such a joke because it always kowtow's in some way or another to private interests.

Re:For an Interesting Exercise in Head Asplosion (0)

Anonymous Coward | about 4 years ago | (#31689950)

What is Google doing then? They cache most pages they "find on the web".

Re:For an Interesting Exercise in Head Asplosion (1)

CAIMLAS (41445) | about 4 years ago | (#31690076)

Finding something on the web does not give you the legal authority to publish and redistribute it

At the same time, if he never agreed to the EULA (and they did not require him to do so in order to read the content) then he's probably over-reacting in deleting the data. What laws might he be breaking, here? I'm not aware of any - though he was certainly setting himself up for wanton litigation on account of the bad publicity.

This isn't wanton publishing of said data. It's a 'derivative work'. Think: someone canvasing an area for who has which kinds of grass is seeded in peoples' yards (and how well its growing) and selling that information.

Re:For an Interesting Exercise in Head Asplosion (0)

Hurricane78 (562437) | about 4 years ago | (#31690388)

Yes it does give you the authority! Do you know nothing about how servers work?

If you look at a page on the web, you send a message to the server, asking “could you please give me that page there?”
And the server then can decide under what conditions it honors your request.
These rules are decided by the site hoster upon installation.
If the server gives you the page freely, and without any conditions (which nearly all web servers do), then you can do with it whatever you want.

Or in short: You passed it on. You split control. If you wanted something or some rules, you should have demanded them. Now it’s too late, so quit bitching! Maybe you learn something from it for next time.

That the laws of some fucked-up pseudo-government have nothing to do with, does not change a thing of those basic physics-based rules of reality.

Re:For an Interesting Exercise in Head Asplosion (0)

Anonymous Coward | about 4 years ago | (#31690702)

voting is a waste of time, get elected for office or hire lobbyists if you really want to see political change

No database copyright (1)

Animats (122034) | about 4 years ago | (#31690786)

Finding something on the web does not give you the legal authority to publish and redistribute it.

The US doesn't have "database copyright". The US has Feist vs. Rural Telephone, which says that "facts" can't be copyrighted. It's legal to scan in a phone book and load the address info into a database. You just can't reproduce the page layout; that's covered by copyright. That decision created the third-party phone book industry and began the era of widespread data mining.

The EULA issue is harder. If you're going to mine Facebook, you probably shouldn't have a Facebook account.

I'm surprised, though, that Facebook doesn't have systems which prevent programs from accessing pages in bulk.

Re:For an Interesting Exercise in Head Asplosion (1)

davidwr (791652) | about 4 years ago | (#31689336)

Practical answer: Next time, do your research overseas.

Commentary: It's sad when you have to do legal forum shopping before starting your research.

Re:For an Interesting Exercise in Head Asplosion (1)

mwvdlee (775178) | about 4 years ago | (#31690368)

Assuming all those profiles were indeed publicly available without having to log in to facebook, how could he have ever violated terms of service if he never agreed to any terms of service?

Am I to assume that anybody that has the misfortune to view a facebook profile without being a facebook member is automagically in violation of facebook's terms of service?

Re:For an Interesting Exercise in Head Asplosion (0)

Anonymous Coward | about 4 years ago | (#31690374)

He could have done that, yes. But there's no guarantee that they would be able to help and even if they could it would be an amazingly huge pain in the ass from any point of view.

Once again our bullshit legal system allows companies to bully innocent citizens into submission. The solution: Bomb facebook headquarters.

If Facebook had done this... (4, Insightful)

John Hasler (414242) | about 4 years ago | (#31688298)

...you'd be flaming them for invading your "privacy".

Facebook *did* do this (5, Insightful)

Chirs (87576) | about 4 years ago | (#31688398)

I see very little problem with an automated scan that respects robots.txt.

By not blocking automated access to the profiles, facebook is squarely at fault.

Robots.txt is insufficient. (4, Interesting)

way2trivial (601132) | about 4 years ago | (#31688632)

I'm sorry- it is..

robots.txt allows you to "refuse a specific named bot" or "refuse everyone" or "allow everything" or "allow these directories" or "only allow these directories"
(want a fascinating read? try robots.txt at your favorite government site- whitehouse.gov used to be fascinating stuff)
there is no way in robots.txt to permit crawling based on intent of information use like a CC license does

I can- with photographs, have a creative commons license that sez "use it for anyhting" "use it with credit to me" "free for non-commercial" etc.
I would WANT google to see my site, I would want bing to see my site- for the purposes of indexing in a search engine.
I can't say in robots.txt
"come in and index for search engines and relevance- but you may not use the data to collect information on our membership for marketing to or marketing their info to others"

If I build a website all about-- coffee- I want the information available to the general public,but from/on my site....

Re:Robots.txt is insufficient. (2, Informative)

truthsearch (249536) | about 4 years ago | (#31688772)

So you block all of your content from being indexed by Google? Because Google's also using your content for marketing.

Also, robots.txt doesn't refuse anything to anyone. It's just a suggestion that any system can ignore. If you don't want systems "seeing" your content, then you must remove your content from the internet or put it behind a wall. A crawler is just another client like a web browser. The internet is intentionally built without discrimination.

You are missing my point (3, Interesting)

way2trivial (601132) | about 4 years ago | (#31688958)

and I really think it is worth making.

Copyright protections are important, the snippet of text that google uses to let people know my site is relevant is easily fair use
I don't have a problem with it- I welcome it as it's beneficial for both myself and google for it to be there.

the ENTIRE TEXT of my site- copied and recopied to put into a web page that exists only to generate ad-sense revenue by a third party is not.
and if robots.txt had a 'license' mode, I'd have a much stronger case of protections if I chose to pursue a blatant copying and re-publication of my site.

robots.txt labels that I wish there were include
'allow function:indexing'
'disallow function:total and complete reproduction'
'disallow function: total and complete reproduction for XXX days'
(so I can allow wayback machine and equivalents'
'disallow function: aggregate data collection'
'disallow function: user data collection'
'disallow function: email collection'

looking at amazon, http://www.amazon.com/robots.txt [amazon.com]
they somewhat do this by putting the information they don't want into the wild in it's own directories
then disallowing those directories- actually, now that I look at it- it's a neat way to go..
but I'd still prefer a robots.txt option that different 'intended use of data to be crawled' permissions covered

Re:You are missing my point (1)

thePowerOfGrayskull (905905) | about 4 years ago | (#31689144)

the ENTIRE TEXT of my site- copied and recopied to put into a web page that exists only to generate ad-sense revenue by a third party is not.

You mean like google cache? I actually agree with you overall -- it's my data, not yours. You may not publicly exhibit copies of it for your own benefit. It's just that it's a difficult line to draw, in large part because of omnibus monetizing service providers like Google.

Re:You are missing my point (1)

wprowe (754923) | about 4 years ago | (#31689418)

If using an apache web server, one can use .htaccess to explicitly control content access. That still doesn't discern intent or use of the content.

Re:You are missing my point (1)

Hatta (162192) | about 4 years ago | (#31690744)

Copyright protections are important

Copyright is irrelevant here. Facts are not copyrightable. This data from Facebook is no different than the collection of data in the phone book. Republishing a page from Facebook or the phone book is illegal. Republishing facts sourced from those pages is not.

Re:Facebook *did* do this (2, Interesting)

sexconker (1179573) | about 4 years ago | (#31688722)

I see very little problem with an automated scan that respects robots.txt.

By not blocking automated access to the profiles, facebook is squarely at fault.

I see very little problem with an automated scan that doesn't respect robots.txt. (As long as it's accessing stuff normal people can get to.)

Anything a machine can do, a meatbag can do, though usually more slowly.
Most anything a meatbag can do, a bunch of meatbags can do much more quickly.

Robots.txt says go away? Amazon's Mechanical Turk says Thank You, Come Again.

Re:If Facebook had done this... (1, Insightful)

Anonymous Coward | about 4 years ago | (#31688404)

If Facebook had released this information we would be flaming?

They did and we still are.

(yes)

Re:If Facebook had done this... (5, Interesting)

2obvious4u (871996) | about 4 years ago | (#31688434)

Isn't this the golden egg of Facebook, I though this is what they were selling. That data is fascinating, it is completely anonymous, yet at the same time very insightful for marketing purposes. I think Facebook is just upset because they plan on selling the same data that Pete was.

Re:If Facebook had done this... (4, Interesting)

NeutronCowboy (896098) | about 4 years ago | (#31689466)

Most likely. Facebook's gold mine isn't even so much the user information itself - it's the networks that they can build out of the relationship data. As of right now, they haven't figured out a way how to make money from it, but they certainly aren't going to let someone take the most valuable aspect of their system - the network information - and put it out in the open.

Personally, I hope someone does the same work, but uploads the raw data anonymously to a torrent somewhere.

Re:If Facebook had done this... (1)

Late Adopter (1492849) | about 4 years ago | (#31689656)

Except Pete can't actually sell the data, that would be a derivative work of their copyrighted web-pages. Sure he has the fair-use ability to publish academic studies, but he'd be limited to using the data internally.

Re:If Facebook had done this... (5, Insightful)

Altus (1034) | about 4 years ago | (#31688486)

why do you think they threatened him? they want to sell this data themselves.

Re:If Facebook had done this... (1)

moteyalpha (1228680) | about 4 years ago | (#31688936)

It seems that many of these data sets are public and easily accessible to analysis. I would find it interesting to simply use various forums like slashdot and have a ranking of who had the most insightful comments by user name. Certainly the data is available as people make it so. It seems that there is a schizophrenic aspect to this, people want to be recognized for what they represent and when they become too famous they get nervous about it.
I am sure that much of this data is already available in an organized form in many places like Google analytics.
I want to know who is the biggest Karma whore, how many times is XKCD linked , and why does that guy named AC have a fascination with goats.
It is also possible to look at commercial pages and identify which ads are placed and then determine who is spending the most money on ads. It then becomes a tool to see what competition is doing. I suspect that much of this data is already available to a number of organizations by a simple data base query.
So if China, Russia, NSA, Iran... data bases every bit of this info in its "secret" data bases that respect no bounds, and the public has no access to it, are we being cheated by being too private?

Re:If Facebook had done this... (1)

anglico (1232406) | about 4 years ago | (#31689194)

exactly! I wish I had mod points, oh wait you're already at 5! I was expecting to read a whole list of complaints against this practice when I started reading the comments, and was surprised to say the least.

Yes, by all means, let's stamp out... (3, Insightful)

jeffb (2.718) (1189693) | about 4 years ago | (#31688448)

...all the researchers who do everything in the open and with proper anonymization.

Re:Yes, by all means, let's stamp out... (1, Interesting)

jeffasselin (566598) | about 4 years ago | (#31688568)

You assume such anonymization is actually possible, I somehow doubt it.

Re:Yes, by all means, let's stamp out... (1)

geekoid (135745) | about 4 years ago | (#31688852)

It is, and it's done all the time in the scientific community.

I don't see why would would think removing peoples names isn't possible.

Re:Yes, by all means, let's stamp out... (2, Interesting)

Anonymous Coward | about 4 years ago | (#31689066)

Even with names removed, data like this can often be traced back to the person. Your name isn't the only unique thing that appears in your facebook profile.

As an example, how many others share your permutation of friends and fan pages?

Re:Yes, by all means, let's stamp out... (2, Informative)

thePowerOfGrayskull (905905) | about 4 years ago | (#31689192)

Removing names isn't necessarily enough. The recent netflix case shows that [securityfocus.com]. I think it's interesting that nobody catches the broader implications of that discussion -namely that whether they're "anonymizing" data for purposes of providing it for research, or selling it for marketing... the ability to reverse engineer patterns to undo it remains a risk. -

"proper" doesn't mean what you seem to think (0)

Anonymous Coward | about 4 years ago | (#31690370)

You (and, sadly, many others looking to make a quick buck) seem to think that "proper" anonymization means removal of Personally-Identifiable Information (PII) from the data.

Removal of PII is neither sufficient nor, in certain cases, necessary for real anonymization. I'll leave the explanatory lecture for my next security class, but a very good rule of thumb for estimating whether an anonymization technique is adequate is whether applying that technique to all documents classified at the Secret level would yield documents suitable for declassification and public release.

If the anonymization technique you're considering would leave behind information which would require the document to remain classified at the Secret level, then it is not "proper" anonymization.

This is actually more important and relevant than you might think as post-9/11 more and more security-related Agencies need to find reliable, automatable methods of publishing (only to other Agencies, of course) the non-classified portions of their classified datasets.

Publicly available (5, Interesting)

mdsharpe (1051460) | about 4 years ago | (#31688452)

Since this is publicly available information, and all he did was send a program to go grab it (much akin to asking your web browser to download it), does this mean Facebook has essentially threatened him for no more than reading too much of Facebook too quickly? Sounds absurd to me.

Re:Publicly available (2, Insightful)

CoffeeDog (1774202) | about 4 years ago | (#31688880)

Just because something is publicly available doesn't mean just anyone is free to reproduce and distribute it. In Facebook's TOS their users agree to give Facebook rights to distribute the data they provide to them. By your logic it should be legal to photocopy and distribute any book that is available from the public library or record and distribute MP3s of any song that was broadcast on a radio station.

Re:Publicly available (1)

Trepidity (597) | about 4 years ago | (#31689214)

You can't copyright facts though, so it's not clear they would own the dataset, depending on how it were created. For example, while Facebook owns the actual literal webpages on facebook.com, it's questionable whether they own the friend graph, which is simply a fact about how people choose to associate themselves.

Re:Publicly available (1)

mdsharpe (1051460) | about 4 years ago | (#31689840)

I see your points, and I guess it's the redistribution that's the main issue. Facebook clearly sees their users' activities as valuable property.

Re:Publicly available (0)

Anonymous Coward | about 4 years ago | (#31690114)

Precisely.

Facebook owns the copyright to their website. Their layout. Their programming. But your images, your "friendships", your relational data, your Twitter-like "what's on your mind" postings, messages, etc. etc. etc. are yours and yours alone. They do not belong to Facebook, and Facebook's terms and conditions, for the most part, reaffirm that.

The problem is, Facebook's not playing by their own rules.

They're essentially saying they own their users.

I could see if the guy copied Facebook's data collection methods, or ripped off anything else that's under copyright, by all means, take him down. But to say gathering anonymous data that's freely available that can be construed as common knowledge by use of a unique data collection algorithm, organizing it, and selling it to people is in violation of any law is frankly not right. Because by those terms, any stock quoting software can have many, many Fortune-500 companies suing the daylights out of them.

Crap, I think I just gave them all an idea... let's hope they don't read /.

Re:Publicly available (1)

cdrguru (88047) | about 4 years ago | (#31690560)

By your logic it should be legal to photocopy and distribute any book that is available from the public library or record and distribute MP3s of any song that was broadcast on a radio station.

Legal, maybe not. But it happens every day over the entire planet. And there doesn't seem to be any reasonable way to stop it, so it is going to continue forever.

Redistribution is the key to the new digital un-economy.

Re:Publicly available (1)

NeutronCowboy (896098) | about 4 years ago | (#31689552)

Not really. It means that Facebook needs to have some data publicly available for users to browse, but that it can't let people take that data out of the Facebook realm. In other words, Facebook knows exactly what it is doing, and is acting in both cases in its best interest.

Now, does that mean that Facebook's approach makes sense, and would stand up in court? I doubt it, but I don't have the cash to test that theory. Which in turn means that the outcome was just as predictable: Facebook makes up random rules and requests, and they stand because most people don't have the resources to challenge the lawyer army of a successful corporation.

Re:Publicly available (1)

prostoalex (308614) | about 4 years ago | (#31689956)

Disclaimer: I work for the company mentioned in the article, not in legal role though.

Privacy is dynamic and "publicly available information" is not set in stone - user could've chosen to hide specific bits of that information a few minutes later, and there doesn't seem to be any update protocol to remove those bits from the scraped DB.

Re:Publicly available (1)

mdsharpe (1051460) | about 4 years ago | (#31690456)

This is a good point. However, does a similar thing not occur in browsers caching data as one surfs the web?

Re:Publicly available (0)

Anonymous Coward | about 4 years ago | (#31690278)

If facebook wants to sell that data, then they must build up some kind of artificial ownership construct to enforce scarcity of that data. Otherwise, whoever else gathers and sells it first wins.

I don't see why they would care about things like law, public availability, or sense. They have their goal (use this data to make money), and they set out to achieve it (they must in some way come to own the data). It doesn't matter a whit to them how it's done--they simply must own the data by hook or crook.

chilling effect (5, Insightful)

Anonymous Coward | about 4 years ago | (#31688464)

Don't see Facebook going after Google, even though the data that they posses is ostensibly the same as Warden's. The primary diff that i see is that warden was offering analysis and results for free- not trying to monetize it. Maybe that's what made them mad.

gray-market black-market (1)

h00manist (800926) | about 4 years ago | (#31688468)

All data that exists, and someone can sell somehow, is for sale somewhere, somehow. That's the law of money, which is rather strong. So forget the right to privacy law, it's not working for a long time now, there is no way to enforce it, just like the law prohibiting drugs, it just doesn't work. I don't know the solution, or if it's good or bad, but that's the situation, like it or not. Wikileaks, for example, is a result of this.

Facebook is evil (1)

trurl7 (663880) | about 4 years ago | (#31688496)

Besides the obvious (wasting time, too much info being shared with future employers), their privacy and data policies have gotten worse and worse. Once you sign up with them, they own everything you do. Or at least so they believe. From his writing, this researches was quite open and tried to be as forthcoming as possible. If they had concerns over anonymity, I suspect he would have been happy to discuss the exact data-scrubbing procedure to make sure it's on the level. But instead, these turds reach for the lawyers.

So it's fine for search engines to cache this data. It's fine for marketing firms to use it to pester even more people. But the moment the researchers get in on it - oh noes, gotta stop that shit from happening.

With any spare time, I'd sit down, recreate the damn dataset and post it to every torrent site in the world. Let's Streisand these jerks!

So (1)

fulldecent (598482) | about 4 years ago | (#31688550)

(not that it was actually destroyed), but why destroy the dataset? Just post to slashdot, wait for someone to send you a link to chilling effects or eff, then follow up with chilling effects or eff, then release the dataset.

Very interesting (2, Informative)

Bearhouse (1034238) | about 4 years ago | (#31688552)

I'll let others debate the 'privacy' issues; (personally I think there's nothing wrong with scraping profile information that people have explicitly made 'public')
Anyways, just check what he did with it; very interesting: (FTA)
http://petewarden.typepad.com/searchbrowser/2010/02/how-to-split-up-the-us.html [typepad.com]
There must be many, many legit uses this data could be put too...shame it's being killed by NIH syndrome

Re:Very interesting (1)

dangitman (862676) | about 4 years ago | (#31689000)

There must be many, many legit uses this data could be put too...shame it's being killed by NIH syndrome

By "NIH syndrome," I assume you're referring to "Not Invented Here." I don't really see what that has to do with this case.

Re:Very interesting (1)

Bearhouse (1034238) | about 4 years ago | (#31690242)

Correct on NIH.
Well, if they were smart, Facebook would already be marketing this data, and/or services based on it, to their users and others.
One could imagine all kinds of apps; "hey, 20% of your friends are in town 'x', why not go there for a weekend"
The links to business could be huge, too...
"Hey, here's a hotel you could stay in..."
If they proposed those kinds of things, instead of asinine games, then maybe I'd be prepared to take them more seriously, (and not have a problem with their using my 'public' data...)

Re:Very interesting (1)

dangitman (862676) | about 4 years ago | (#31690538)

The thing is, that I just don't understand why you would use "NIH Syndrome" in this context. That is usually used when somebody in Company X says "Hey, why don't we use this awesome technology to make a better product," but is rebuffed by Company X because the technology was invented by company Y.

In this example, there is no new technology involved, and Facebook already has the data. What is "not being invented here"? Facebook already invented Facebook, how is Facebook using the data they generated inventing something that Facebook didn't already invent?

Facebook.

Facebook does stuff like this a lot (5, Interesting)

TheSpoom (715771) | about 4 years ago | (#31688564)

They did something similar to FB Purity [fbpurity.com], a Greasemonkey script that allows users to filter out apps and other stuff they don't want to see in their feed. Facebook argued that they were misusing their "FB" trademark... eventually they let them continue under the name "fluff busting purity", probably due to the PR backlash that shutting them down would bring.

They've also shut down the Facebook portion of the Web 2.0 Suicide Machine [suicidemachine.org], which runs scripts that allow a user to delete their social profiles as thoroughly as sites will allow. In that case, they argued that the Suicide Machine was violating their "Statement of Rights and Responsibilities"... which isn't even a law! Nonetheless, the Suicide Machine didn't have the financial ability to fight even frivolous claims like that, so they folded that section.

Facebook apparently believes that its users will continue using the site regardless of the ridiculous access policies that their legal department create and defend. I hope they're wrong.

Re:Facebook does stuff like this a lot (0)

Anonymous Coward | about 4 years ago | (#31688700)

They're not.

Re:Facebook does stuff like this a lot (5, Insightful)

Anonymous Coward | about 4 years ago | (#31688800)

They're not wrong though. People on FB constantly get outraged at new policies, interfaces and features, but I don't know of anyone who has actually left the site. I am just as bad myself; all I've done is remove everything from my profile and just use it as a hub to stay in contact with people all around me, I haven't gone as far as stopping using the site, and I don't think I will. Nor will many people.

Re:Facebook does stuff like this a lot (1)

ZekoMal (1404259) | about 4 years ago | (#31689664)

I left the site. Well, I tried to. At first, they told me that I could only "suspend" the account; ie, people could still send me stuff and FB kept ALL of my data. Outraged, I tried to find an alternative.

Surprise, surprise. After digging through their FAQ I found an obscure part of it that said you could permanently delete. Here's the problem with it. After you agree to permanently delete, it stays up for two weeks. If you log in even once, it undoes the delete option. Furthermore, there is no guarantee anywhere that your data is actually gone.

I'm never one to scream "sue"...but if I can't confirm that my data is off of their useless website, I am fucking suing.

Re:Facebook does stuff like this a lot (0)

Anonymous Coward | about 4 years ago | (#31689680)

I left. No sweat off my back.

Re:Facebook does stuff like this a lot (0)

Anonymous Coward | about 4 years ago | (#31690352)

Even better... not only do people not leave Facebook, they protest the change in policies by making Facebook groups about how they hate Facebook. (And I think it's safe to say that Facebook doesn't care if those groups exist as long as the users stick around.)

Re:Facebook does stuff like this a lot (1)

CAIMLAS (41445) | about 4 years ago | (#31690664)

It's probably something to do with the fact that: eh, you can:

1) leave the site and have them keep all the data, while at the same time not be able to view your friends' profiles again
2) stay

Re:Facebook does stuff like this a lot (2, Insightful)

flabordec (984984) | about 4 years ago | (#31688892)

Facebook apparently believes that its users will continue using the site regardless of the ridiculous access policies that their legal department create and defend. I hope they're wrong.

I'm afraid the average Facebook user is a teen who is more worried with getting a higher score in whatever Flash game she is currently playing than in FB's access policies for computers.

Re:Facebook does stuff like this a lot (1)

ZekoMal (1404259) | about 4 years ago | (#31689798)

This. I tried to convince three friends to quit FB, and they were vehemently against it.

Three different reasons given:

1. I have nothing to hide, so why not share everything with everyone?

2. My privacy settings are on, so it's okay.

3. I don't care, I want to keep in touch with my friends that live in the same dorm that I also text obsessively and eat every meal with.

My generation is as anti-privacy as they are anti-copyright; they hate the establishment but love giving said establishment all of their data.

On what grounds? (1)

adam.skinner (721432) | about 4 years ago | (#31688600)

Legal action? On what grounds, and for what damages? What did this guy have to fear? Jail time? Court imposed fines? He doesn't need a lawyer to defend him in this.

Re:On what grounds? (1, Interesting)

Anonymous Coward | about 4 years ago | (#31689226)

This is America, defending yourself in court against a lawyer is legal suicide. I could argue that Cyanide is lethal and Dynamite is combustible in an American Court but if I were up against a lawyer I guarantee I would lose. Despite that these are practically non-disputable facts the American Court System is setup so it is impossible to argue respectably without paying the Lawyer Tax.

Example:
1.) I go into court and argue that Cyanide Brand X should carry a "Poison" label.
2.) Theoretical makers of Cyanide Brand X hire 5 lawyers, because they can.
3.) Lawyers state as defendant they wish to have a trial by jury (a right guaranteed by the constitution, called a Jury of you Peers)
4.) Jury selection weeds out anyone with previous knowledge of the effects of Cyanide, and anyone with background in biology or chemistry because they would not be impartial.
5.) The result is a jury of people who are completely un-knowledgable and as such completely persuadable either way.
6.) The Lawyers of Cyanide Brand X bring in a variety of "Expert Witnesses" who are of course "compensated for their time" and who state that no Cyanide doesn't kill you.
7.) Because the Jury is 100% impartial and also 100% uninformed besides what they have been told in court, their only choice is to assume these Paid or Compensated "Expert Witnesses" were correct because they are scientists!
8.) The result is that you I lost a case arguing what should have been a foregone conclusion to begin with, because somebody brought more money and lawyers than you.

Re:On what grounds? (2, Informative)

cdrguru (88047) | about 4 years ago | (#31690466)

If your position in entering the above motion was that "I'm right, so I should win" and offered nothing else - such as expert witnesses of your own, you are going to war unarmed. Of course you are going to lose.

The adversarial system is based on the idea that you have to defend your position. Ranting that "I'm right" doesn't count for much - presenting facts, witnesses, expert testimony, etc. is what counts. And doing so in the proper format for the court.

You are mostly correct that a lawyer would know these things and how they are done in court. Therefore, yes, almost always a lawyer is required, if for no other reason than to get through the proper procedural format of the court process. You want to do it yourself? You better spend some time learning how it is done, what is required to win and how to get there. Without that education, it is like taking someone that doesn't know computer programming and having them debug a program in an Assembler language.

Don't have the time to learn all this stuff? Well, that is why we have lawyers.

You see, Facebook doesn't only control your... (1)

Jalfro (1025153) | about 4 years ago | (#31688724)

personal information - they own it!

Re:You see, Facebook doesn't only control your... (2, Insightful)

NeutronCowboy (896098) | about 4 years ago | (#31689586)

Someone ought to mod this up. Facebook's only value is in the information you provide to Facebook about who you are, where you live and who your connections are. As a result, they will defend that little nugget as if their life depended on it - because it does.

Like hell he deleted it though. (0)

Anonymous Coward | about 4 years ago | (#31688870)

He'll have a few recordable DVDs lying around somewhere to use when FB eventually dies or he thinks enough time has passed to anonymously float the data out on a torrent.

Don't worry... (3, Interesting)

turbotroll (1378271) | about 4 years ago | (#31688876)

Somebody else will do it again, this time anonymously and with an evil robot that hides its tracks. It only takes perl, LWP, MySQL, tor and a little time and imagination to do so.

Fuck you, Zuckerberg.

Re:Don't worry... (0)

Anonymous Coward | about 4 years ago | (#31689338)

That's an awfully specific set of tools. Don't you think you could have gotten your point across without resorting to dropping names of your pet tools?

WHAT TOS? (0)

Anonymous Coward | about 4 years ago | (#31689052)

Quote: Facebook claimed he had violated its terms of service

As I understand it the information was openly available and therefore does not require you to use Facebook friend requests to get it. I fail to see how Facebook can impose a TOS on someone who accesses the site but does not use the service.

Is it assumed I agree to the TOS of Yahoo.com by visiting the frontpage? Is it assumed I agree to the TOS of any website by just visiting, even though they may not have explicitly stated I have agreed to it? If I can make people agree to a TOS without their knowledge than I am going to file a lawsuit against Facebook claiming they owe me $1,000,000 because it is in the TOS right here on my desk about them using my data.

Clue to Pete Warden. (0)

Anonymous Coward | about 4 years ago | (#31689106)

Twilight was written by a Morman Author. That's why it shows up in your morman section. Apparently writing a script to scrape facebook profiles is easy research, but not looking up an entry in wikipedia.

http://en.wikipedia.org/wiki/Stephenie_Meyer [wikipedia.org]

Re:Clue to Pete Warden. (0)

Anonymous Coward | about 4 years ago | (#31689572)

Morman? I thought it was written by a Merman!

Interesting data (0)

Anonymous Coward | about 4 years ago | (#31689740)

Ignoring the legality of it for a moment. What sort of questions can we ask and answer with the facebook data? Look how he has managed to divide the US into groups based on who is friends with who? That's a very interesting way of dividing up a country! StayAtHomeIa. Haha.

I for one, wish the entire facebook profile database was made public (with personal identifiable information removed). The benefit to researchers would be immeasurable.

RTFA (1)

Chees0rz (1194661) | about 4 years ago | (#31689744)

This is one case I am glad I RTFA. The dataset is destroyed, but there is still a neeto little web application to play with. It's fun to poke around with... I find myself wanting more.

And of course facebook wanted to shut him down... this is probably data they are collecting themselves and are selling / want to sell :)

This is data, not protected by copyright (1)

digitalgimpus (468277) | about 4 years ago | (#31690752)

I'm not sure copyright law even applies here. No more than it applies to say Google or Yahoo. He scraped DATA from a publicly accessible website as permitted by the robots.txt file. How is this really any different than what Google or Yahoo does? Perhaps the distribution? Though that's hardly significant in this case as the data is already out there. He just organized the presentation. Sounds to me like Facebook just pushing buttons to try and avoid another privacy controversy. /IANAL //Don't use facebook, I'm aware what companies are scraping and misusing what they sniff all too well.
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...