Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Published Google Docs To Appear In Search Engines

kdawson posted more than 4 years ago | from the expectation-of-privacy dept.

Google 62

dotancohen writes "Google plans to make all published documents from Google Docs users crawlable, if the documents are linked from a public Web site. No official announcement appears to have been made, just a short blog post on the subject by a Google employee in a help forum. (One comment on the ghacks.net post linked above says that email was sent to the admins of Google Apps accounts.) There does not seem to be any way to make an individual document not crawlable; you can only un-publish it, at which point Web links to it will not work any more." The move makes sense from one point of view — Google is just making crawlable a document linked from another crawlable document — but it's likely to catch a lot of people by surprise.

cancel ×

62 comments

Sorry! There are no comments related to the filter you selected.

Summary is wrong (5, Informative)

sopssa (1498795) | more than 4 years ago | (#29508375)

The summary or the article doesn't mention all aspects on it. For a better article, see theregister [theregister.co.uk] . "Google plans to make all published documents from Google Docs users crawlable, if the documents are linked from a public Web site." is wrong.

This only applies to files explicitly published using the suite's "publish as web page" or "publish/embed" options and linked to from a public webpage. This does not apply to files shared via the "Allow anyone with the link to view (no sign-in required)" option, which provides for document sharing without links to the public web.

So its not really as bad as it sounds. You have to explicitly publish them as webpage, which atleast for me tells that they might get indexed aswell, even more so if they are linked to from other websites.

The good thing Google could do here is to add explicit warning or small text under the publish option that the content you publish as webpage might be indexed by search engines aswell. Other than that I dont see a problem with this, as the users are explicitly publishing them.

Re:Summary is wrong (5, Insightful)

AvitarX (172628) | more than 4 years ago | (#29508453)

I'm actually shocked they weren't already.

I mean, that's what google does, it indexes things/

Why would I expect my google doc I link to would be treated any differently than say, a PDF doc I link to?

I really just took it for granted that is was searchable,

Re:Summary is wrong (1)

john83 (923470) | more than 4 years ago | (#29508611)

Me too. The summary says this may take a lot of people by surprise, but I think very few people will have determined that those documents aren't searchable once they're linked to externally.

Re:Summary is wrong (1)

AliasMarlowe (1042386) | about 5 years ago | (#29513197)

The summary says this may take a lot of people by surprise,

It's surprising how many people are surprised by obvious things.

Re:Summary is wrong (1)

pbhj (607776) | about 5 years ago | (#29513513)

The summary says this may take a lot of people by surprise,

It's surprising how many people are surprised by obvious things.

I'm surprised at that.

Re:Summary is wrong (1)

megamerican (1073936) | more than 4 years ago | (#29509037)

Google probably already did and used it for internal use only and are now just making it publicly available. If Google came out right away and did this, even in the limited way they are, it could easily dissuade people from using the service.

Re:Summary is wrong (2, Funny)

jcdill (6422) | about 5 years ago | (#29511689)

I'm shocked too. "D'oh, Google indexes publicly linked files? Who would have thought of such a thing?"

This won't take a "lot of people" by surprise but it might take a "lot of stupid people" by surprise. Which is not surprising.

Re:Summary is wrong (5, Insightful)

stocke2 (600251) | more than 4 years ago | (#29508455)

It would seem to me, if you are publishing it as a webpage purposely and linking to it from a public website, one would think you would like it to be crawled.

I got the email from google since I admin two google apps domains, and have no problem with it. We don't normally publish docs like this, but if I did it would be because I wanted them found.

I am sure a lot of people on here are going to go overboard like they always do because it is google, but it is not going to expose all of your private docs.

Re:Summary is wrong (0, Offtopic)

Dan541 (1032000) | about 5 years ago | (#29512039)

how about the problem that allowed gmail users to see each others inboxes? Will that also happen with docs?

Re:Summary is wrong (1)

stocke2 (600251) | about 5 years ago | (#29524733)

Try actually reading that story, they couldn't read each others email per se, it was an error in migrations that put migrated emails in the wrong account.

Re:Summary is wrong (1)

Dan541 (1032000) | about 5 years ago | (#29526869)

It makes no difference how they did it. That's like saying they used sledge hammer instead of an axe to break in.

They still violated their users privacy and took an incredibly long time to fix it.

Re:Summary is wrong (0, Funny)

Anonymous Coward | more than 4 years ago | (#29508481)

Come on, sopssa, don't let little things like the facts get in the way of a good, sensational story!

Re:Summary is wrong (1, Funny)

bertoelcon (1557907) | about 5 years ago | (#29512365)

don't let little things like the facts get in the way of a good, sensational story!

This isn't a troll, it is actually the slashdot editor creed.

Re:Summary is wrong (1)

poetmatt (793785) | more than 4 years ago | (#29508617)

I *think* they already let you know it can be crawled, now it's just being more explicit. When you press publish there is more than just a single confirmation to an extent too, so it's not like "whoops". The links pretty much say "only allow registered users to see this/anyone can see this/etc" so you can control whether your link is for all or for a specific person.

Thus, this "change" pretty much doesn't change anything.

Re:Summary is wrong (2, Insightful)

Ractive (679038) | more than 4 years ago | (#29509017)

Posted by kdawson on Tuesday September 22, @03:20PM

Re:Summary is wrong (1)

fermion (181285) | more than 4 years ago | (#29509023)

First sentence of the summary implies this:
...make all published documents from Google Docs users crawlable, if the documents are linked from a public Web site

This means that the document is accessible to the public, and any web crawler can get to it, assuming that the crawler indexes google documents, and that google does not block these crawlers. If google is not indexing the documents, the google is not doing it's job, which is to index all online material.

Of course some firms that are looking for free hosting and does not understand the internet will object, as their 'private documents' now show up in searches. But this is just like some news organizations object that search engines index and link to content, or some lame firms object that people deep link. There are those that understand the technology, and those that choose to think technology does what they wish.

I do see this as different from the case of older versions of MS Windows automatically exposing documents on personal computers to the world. In that case there was an expectation for privacy. In this case there is not.

Re:Summary is wrong (1)

Aldenissin (976329) | more than 4 years ago | (#29509519)

Of course some firms that are looking for free hosting and does not understand the internet will object, as their 'private documents' now show up in searches. But this is just like some news organizations object that search engines index and link to content, or some lame firms object that people deep link. There are those that understand the technology, and those that choose to think technology does what they wish.

I do see this as different from the case of older versions of MS Windows automatically exposing documents on personal computers to the world. In that case there was an expectation for privacy. In this case there is not.

To your first point, the question I have is this, what is a company to do that has a private intranet and doesn't want a document that is "published" open for everyone to see? I know that MS office reader software is prevalent in my workplace for the call center floor, and now this definitely presents an obstacle for using Google docs in our atmosphere. And to your second, I would agree that it is different.

Re:Summary is wrong (1)

pbhj (607776) | about 5 years ago | (#29513505)

The good thing Google could do here is to add explicit warning or small text under the publish option that the content you publish as webpage might be indexed by search engines aswell.

You're one of those guys that calls for rear-view mirrors to say "objects in the mirror may be behind you" aren't you?

I mean blimey *news flash* "this just in, stuff on the web may be indexed by search engines, full story after this".

Google notifed users by email (5, Informative)

Anonymous Coward | more than 4 years ago | (#29508397)

At least for apps administrators, the following email was sent out with instructions on how to prevent this:

*****

Hello Google Apps admin,

We wanted to let you know about some important changes around published documents, spreadsheets, and presentations.

In a few weeks, documents, spreadsheets and presentations that have been explicitly published outside your organization and are linked to from a public website will be crawled and indexed, which means they can appear in search results you see on Google.com and other search engines. There is no change for documents published inside your organization or shared privately.

If you wish to prevent users from publishing documents to the public internet, we now offer an admin control in the Google Apps Control Panel that allows users to continue to 'share documents outside the domain' without allowing them to publish the files to the public Internet. To change this setting, follow these steps:

- Login to your admin control panel
- Select Service Settings > Docs
- Un-check the option 'Users can publish documents to the public internet'

If a user does not want their published Docs to be crawled, then the user must unpublish them by doing the following:

- Go to the 'Share tab'
- For documents and spreadsheets, choose 'Publish as web page'. For presentations choose 'Publish/embed'
- Click on the button that says 'Stop publishing'

For more details, please see this Help Center article: http://www.google.com/support/a/bin/answer.py?hl=en&answer=60781 [google.com]

This is a very exciting change as your published docs linked to from public websites will reach a much wider audience of people!

Sincerely,

The Google Apps Team

Email preferences: You have received this mandatory email service announcement to update you about important changes to your Google Enterprise product or account.

Google Inc.
1600 Amphitheatre Parkway
Mountain View, CA 94043

Necessary people Notified !!! (4, Informative)

Lordy2001 (951056) | more than 4 years ago | (#29509041)

As an admin of multiple Google apps sites that email was sent to me for each App site administered. I don't see why the summary implies that there was no notifications, but this being Slashdot I am not surprised.

No way! (5, Funny)

bmetzler (12546) | more than 4 years ago | (#29508417)

You mean things available on the internet might be indexed by Google? Holy Cow! I wonder if other search engines also do this "indexing" thing. Mysterious and curious activities for sure, I say.

Re:No way! (5, Funny)

Anonymusing (1450747) | more than 4 years ago | (#29508643)

I know! So much for Google's motto of "Don't be evil". Obviously they mean "except when indexing publicly accessible web links"! Those hypocrites!

Re:No way! (1)

SanityInAnarchy (655584) | about 5 years ago | (#29511517)

Those were my thoughts exactly.

it's likely to catch a lot of people by surprise.

Yeah, well, so is the fact that if they upload personal stuff to the Internet, people might find it on the Internet. If this wasn't covered by whoever introduced you to the Internet, that's a bit like handing a toddler a knife without telling them to be careful.

Wait.. (5, Funny)

R2.0 (532027) | more than 4 years ago | (#29508443)

Are you saying that I can't publish a document on the Web but limit who sees it?

That's an invasion of my privacy! Next thing you know you'll be saying I can't stop people watching me bang my wife in my front yard!

Re:Wait.. (3, Funny)

Anonymous Coward | more than 4 years ago | (#29508519)

Next thing you know you'll be saying I can't stop people watching me bang my wife in my front yard!

We will gladly help you to keep perverts at a distance, but you must give us the address of the front yard.

Re:Wait.. (3, Funny)

selven (1556643) | more than 4 years ago | (#29509321)

1600 Pennsylvania Ave, Washington DC 20500

Re:Wait.. (1)

DaleCooper82 (860396) | about 5 years ago | (#29511241)

Next thing you know you'll be saying I can't stop people watching me bang my wife in my front yard!

Did you consider banging her inside the house?

So? (1)

nweaver (113078) | more than 4 years ago | (#29508457)

I'd hope that Bing would already crawl these documents. To NOT do so is an oversight.

Re:So? (1)

R2.0 (532027) | more than 4 years ago | (#29508527)

I had a coworker use "bing" as a verb today on a concall. I "corrected" her.

Re:So? (2, Funny)

CastrTroy (595695) | more than 4 years ago | (#29508687)

Not as bad as Scott Hanselman [hanselman.com] saying I'm going to Google that on Bing. Can't remember what the date was, but he said it on his podcast.

Re:So? (4, Funny)

natehoy (1608657) | more than 4 years ago | (#29508863)

If you don't remember the date, you can always Bing it on Yahoo!

Re:So? (1)

patrickthbold (1351131) | more than 4 years ago | (#29509701)

I prefer the term "Bang" when referring to the verbification of "Bing".

Re:So? (1, Funny)

Anonymous Coward | more than 4 years ago | (#29510157)

I like the term "Bung" as the past tense.

Re:So? (1)

AnalPerfume (1356177) | about 5 years ago | (#29511947)

For a second I read that as "bring it on Yahoo" meaning for Yahoo to now match it. I keep forgetting about MS-Search-Rebranded (version whatever it is now).

Re:So? (1)

maxume (22995) | more than 4 years ago | (#29509317)

Did someone immediately punch him in the Alta Vista?

Re:So? (0)

Anonymous Coward | more than 4 years ago | (#29508891)

My inner seventh grader is picturing a sentence like "I binged your mom last night."

Re:So? (2, Informative)

selven (1556643) | more than 4 years ago | (#29509329)

Besides, what is the past tense of "bing"? Is it "bang", as in "I don't know much about your mother so I bang her"?

Re:So? (0)

Anonymous Coward | about 5 years ago | (#29516441)

"bing" already is the past tense. The past tense of "to google"

stupid irregular verbs...

Re:So? (0)

Anonymous Coward | about 5 years ago | (#29511969)

I "had" a coworker "use" "bing" as a verb today on a concall. I "corrected" her.

fixed that for you.

Re:So? (1)

geminidomino (614729) | about 5 years ago | (#29513873)

No, you didn't

The notification I received (0, Redundant)

ISurfTooMuch (1010305) | more than 4 years ago | (#29508461)

I administer a domain using Google Apps, and I got a notification by e-mail a few days ago. Here is the text of it:

Hello Google Apps admin,

We wanted to let you know about some important changes around published documents, spreadsheets, and presentations.

In a few weeks, documents, spreadsheets and presentations that have been explicitly published outside your organization and are linked to from a public website will be crawled and indexed, which means they can appear in search results you see on Google.com and other search engines. There is no change for documents published inside your organization or shared privately.

If you wish to prevent users from publishing documents to the public internet, we now offer an admin control in the Google Apps Control Panel that allows users to continue to 'share documents outside the domain' without allowing them to publish the files to the public Internet. To change this setting, follow these steps:

- Login to your admin control panel
- Select Service Settings > Docs
- Un-check the option 'Users can publish documents to the public internet'

If a user does not want their published Docs to be crawled, then the user must unpublish them by doing the following:

- Go to the 'Share tab'
- For documents and spreadsheets, choose 'Publish as web page'. For presentations choose 'Publish/embed'
- Click on the button that says 'Stop publishing'

For more details, please see this Help Center article: http://www.google.com/support/a/bin/answer.py?hl=en&answer=60781 [google.com]

This is a very exciting change as your published docs linked to from public websites will reach a much wider audience of people!

Sincerely,

The Google Apps Team

Don't release information on the internet... (1)

fyrewulff (702920) | more than 4 years ago | (#29508463)

Public, linked data able to be recorded, indexed

News at 11

(In other words, non-story. If you don't want something indexed... don't link it on the public internet.)

OH MY FUCKING GOD! (0)

Anonymous Coward | more than 4 years ago | (#29508697)

You mean a web enabled document that I specifically published to the internet using a web search companies service is getting indexed by said company and will be available for searches?

I NEVER SUSPECTED!

DOWN WITH GOOGLE FOR DOING WHAT THEY DO!

I'm aghast! (5, Insightful)

natehoy (1608657) | more than 4 years ago | (#29508767)

I'm actually surprised that, so far, no one has misinterpreted this as "all your Google Docs are belong to our search engine" along with a few jihaddist vows to delete all data from Google immediately. Instead, everyone seems to have read the article and understand that these documents already should have been indexed, because the users published them on a web site the public has access to.

Who are all of you people, and what have you done with my Slashdot????

Re:I'm aghast! (0)

Anonymous Coward | more than 4 years ago | (#29508953)

Have you tried browsing at another score threshold.

Re:I'm aghast! (1)

natehoy (1608657) | more than 4 years ago | (#29509527)

Interestingly, the post I am currently replying to is the only "zero" post at this moment. :)

There was one troll, but I don't feed them.

Re:I'm aghast! (2, Funny)

mewsenews (251487) | more than 4 years ago | (#29509383)

Heh.. umm.. heh.. *pushes glasses up nose* I'm supposed to .. umm ... the jihad is alive and well, heh .. but

*portly neckbeard appears*

what?

*whispering*

ok

*portly man exits*

ok anyway, we're taking it easy on Google this time and if you want to disagree with us you are required to accuse us of having rose-coloured glasses and being on a "honeymoon" with them.

If you accuse us of reading the article again, we're not.. umm.. there will be.. problems waiting for you, in fact I might not be surprised if you woke up and found a snarky reply attached to one of your comments. You do not mess with us.

*wipes cheeto crumbs from shirt*

Yeah.

*exits*

Re:I'm aghast! (1)

natehoy (1608657) | more than 4 years ago | (#29509481)

Excellently painted image, sir! LOL!

Re:I'm aghast! (1)

MobileTatsu-NJG (946591) | about 5 years ago | (#29510721)

I'm actually surprised that, so far, no one has misinterpreted this as "all your Google Docs are belong to our search engine" along with a few jihaddist vows to delete all data from Google immediately. Instead, everyone seems to have read the article and understand that these documents already should have been indexed, because the users published them on a web site the public has access to.

Who are all of you people, and what have you done with my Slashdot????

It's the same people. It's just that this story was about Google and not Apple.

Re:I'm aghast! (1)

Anarchduke (1551707) | about 5 years ago | (#29511587)

It's the same people. It's just that this story was about Google and not Microsoft.

Fixed that for ya.

Re:I'm aghast! (1)

DrWho520 (655973) | about 5 years ago | (#29514417)

Just give it an hour or two. The basement dwellers are still wrapped in the throws of slumber. But soon, they will throw off the confines of their StarBlazers comforter, imbibe their hideous concoction of cold pizza and Cocoa Puffs and ready themselves for an assault upon the stone walls of reason! Beware their +16 Rashness tinfoil hats! It makes them impervious to reason!

What's the big deal? (1)

wiredlogic (135348) | more than 4 years ago | (#29508855)

They already fully index Office documents and PDFs. This shouldn't cause anyone alarm. I don't know how the publishing mechanism works but one would presume that there is still the option of using robots.txt to lock out googlebot from publicly accessible sites.

Re:What's the big deal? (1)

Kalriath (849904) | more than 4 years ago | (#29509171)

What? We're talking about Google Docs. No, you can't use robots.txt. What you can do, is not publish documents you don't want read.

Good (4, Interesting)

Tsiangkun (746511) | more than 4 years ago | (#29508887)

This does nothing to the docs I've shared with other gmail users or people with google accounts, while enhancing the web presence of the documents I've explicitly published as web pages.

I expect web pages to be crawled, indexed, and searchable.
I see this as a good thing.

Re:Good (1)

fulldecent (598482) | about 5 years ago | (#29514505)

I agree.

Also, I expect a black hole will open up trolls on any document set with "Allow anyone to edit this document" enabled.

Perposterous! (5, Funny)

not already in use (972294) | more than 4 years ago | (#29508935)

I for one am filled with feigned outrage, because the way slashdot presents this article dictate I be!

Viruses (1, Interesting)

ACMENEWSLLC (940904) | more than 4 years ago | (#29508993)

A large number of the e-mail viruses I see have links to a Google Groups site, which then play a video or have other embedded content that utilizes and exploit to try and load malware. Often, XP Antivirus and the variants. Many of these are showing up in Google results too.

What do you think are the odds that exploited documents will be published to these documents too?

Re:Viruses (3, Informative)

Kalriath (849904) | more than 4 years ago | (#29509179)

What do you think are the odds that exploited documents will be published to these documents too?

Zero, because this is about Google Docs, not Google Groups.

Re:Viruses (1)

Anarchduke (1551707) | about 5 years ago | (#29511593)

People still infected their machines by opening executable attachments to email, not to mention the thousands of fucking viruses found on Limewire. Of course virus writers are going to find a way to use Google Docs to exploit people's stupidity.

The problem isn't Google Groups, Google Docs, Limewire, or anything else. The problem is that morons have access to the Internet.

Re:Viruses (1)

DanJ_UK (980165) | about 5 years ago | (#29512269)

I wish I had mod points.

un-publish? un-check? (1, Informative)

Anonymous Coward | about 5 years ago | (#29512429)

Just a point of clarity...

``un-publish it''

Un, as a prefix, means not; e.g. unaware means ``not aware''. Therefore ``un-publish'' would mean ``not published'' which is a state, not a verb.

If we wish to contort the English language, the ``correct'' prefix in this case would be ``de'', indicating negation. This would yield depublish, decheck and deinstall which, although ugly and cumbersome, are at least correct by the rules of the language.

For an example familiar to the audience, the process of decryption yields unencrypted plain-text.

Hopefully this will help to avoid any grammatical embarrassment when one's Google documents are presented unto the World.

robots.txt (0)

Anonymous Coward | about 5 years ago | (#29513677)

Problem solved.

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?