Frequent Slashdot contributor Bennett Haselton writes "Facebook indirectly accused Google of creating dummy accounts to log in and spider information from their site, and Google denied the charge. But if Google wants to help users discover what strangers can find out about them, then spidering Facebook with dummy accounts is exactly what they should be doing." Read on for the rest of his thoughts.
In the dust-up over the revelation that Facebook had paid a PR firm to plant negative stories about Google indexing Facebook's site, one point was often overlooked: the allegation that Google had been creating dummy Facebook accounts, and using them to log in to Facebook and spider information that was only available to Facebook users. This was denied by Google and never proven, but the denial obscures a more important point. Paradoxically, rather than hurting user privacy, it would have helped to protect user privacy in the long run if Google actually had been logging in to Facebook, spidering the information that was available to members, and making that information available in Google search results.
To review the facts not in dispute: When you create a Facebook profile, Facebook by default makes certain categories of information viewable to other users. Most of your personal information (in particular, your contact information) is viewable to other members that you confirm as your Facebook friends. A narrower set of information — usually including your name and your interests, but not including your contact information — is viewable to other Facebook members who are signed in to Facebook, but who are not in your friends list. (Let's call this the "Facebook stranger" version of your profile.) Finally, since 2007 Facebook has made an even smaller subset of information available in a "public search listing," which can be viewed without being logged in to Facebook or even having an account. Facebook explicitly stated that one reason for creating these public search listings was to make the profiles more easily findable by Google.
Now, the op-ed that Burson-Marsteller was trying to plant in the press strongly suggested that Google was using tactics like creating fake Facebook accounts in order to log into Facebook and scrape the "Facebook stranger" version of people's accounts, and not just the public search listings. (For one thing, the op-ed accused Google of likely "violating the Terms of Service" of Facebook. While scraping the public search listing obviously doesn't violate the TOS, creating dummy accounts to log in to Facebook and spider content automatically certainly does — and that's the only thing Google could do on Facebook beyond spidering the public search listing.) Of this allegation, Wired senior writer Steven Levy wrote:
This information is a lot easier to unearth from inside Facebook, but actually logging into Facebook to purloin information would indeed be troublesome. For one thing, it would violate the terms of service agreement. Is Google doing this? One of the Burson operatives implied that it is. But Google says the company does not go inside Facebook to scrape information, and I find this credible. (If Facebook has logs to prove this serious charge, let's see them.)
But why is this such a scurrilous charge anyway?
When you search for a person's name on Google, you might be looking for information about that person, or you might be doing research on what other people in the world can find about that person (particularly if that person is yourself). If a certain fact about you — for example, the members of your Facebook friends list — is viewable to anyone with a Facebook account as long as they're logged in to Facebook, then anybody in the world can obtain that information about you anyway, by getting their own Facebook account. So it's perfectly legitimate for Google to report that as a fact that anyone can find about you, if you Google your own name. You may not like the fact that Facebook exposes that information about you to anyone with a Facebook account, but it's Facebook, not Google, that makes the information available to anyone. If you Google your own name and Google tells you that that some piece of information is available to any Facebook user, Google is doing you a favor.
For that matter, it's not that easy to view your own "stranger Facebook profile" on Facebook, to see for yourself what other users can see about you. You can't just click your own profile while signed in, since that will show you all of your own personal information. You can't sign out and then click your own profile, since that will show you your public search listing (which is shown to non-logged-in users). You would have to, instead, create a second dummy Facebook account (already a violation of Facebook's TOS), which usually requires creating a second email address that you can tie to your second Facebook account, then signing in with your second account and trying to view your "real" one... How many people — even the most privacy-conscious ones who pore over every article about Facebook allegedly exposing their data — have ever tried that experiment? Having the information already spidered by Google would make it much easier.
When would you actually derive some privacy benefit from not having your "Facebook stranger" profile information listed in Google? Really, only if you're being looked up by a particularly lazy stalker who searches your name on Google — but then doesn't even bother signing in to Facebook and searching for your name on Facebook. If they're motivated enough to find you on Facebook and view your "Facebook stranger" profile there, then you've gained nothing by blocking that information from Google.
Notice this argument does not extend to some general principle that webmasters shouldn't be able to tell Google not to index parts of their website. Many websites have specified, using the Robots Exclusion Standard, that they don't want Google indexing certain documents on their site. (The Robots Exclusion Standard allows webmasters to create a file called robots.txt on their website, which tells search engines not to index any files listed in the robots.txt file. It would be technically possible for a search engine to ignore that directive and index the documents anyway, but virtually all search engines do follow it.) In that scenario, even if a document listed in robots.txt contains personal information about someone, there's no argument that "someone could find it anyway by searching, so Google is doing you a favor by listing it," because nobody would be able to find it by searching unless Google lists it. What makes Facebook a special case is that (a) it has its own search function, and (b) more importantly, it's already the place that everybody knows to go looking if they're searching for a person. These two facts mean that people can find you on there without Google's help.
That might sound unfair to Facebook — that simply because they've achieved success, different rules should apply to them, and Google ought to be allowed to violate their TOS by logging in to their system and spidering people's Facebook-stranger information. But it's the only way for Google to display honest answers, if a user comes to Google to ask: What can strangers on the Internet find out about me?
P.S.: I received many useful suggestions in response to a previous article, in which I described an algorithm for crowdsourcing the abuse-complaint-review process on Facebook, and offered a $100 prize split between users who sent in the best criticisms or improvements. So I'm going to do it again in a more free-form approach: I'll offer a $50 prize to be split between readers who email me the best negative comment or counterargument to the argument that I've just made here. Entries have to be submitted by email, although of course you can and should post your thoughts in the comment threads as well. Email bennettSPAMMERS at SUCKpeacefire dot org with "googlebot" in the subject. You can also donate your winnings to a charity of your choice.