×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Is Microsoft Crawling Google?

CmdrTaco posted more than 9 years ago | from the put-on-your-foil-hat dept.

Microsoft 480

triplecoil writes "Jason Dowdell over at WebProNews has written a piece questioning a tactic Microsoft might be using to beef up its new search engine. He thinks they might be dipping into Google's results to supplement its own. Dowdell likens it to leaving your garbage on the curb--anyone could conceivably go through it and take whatever is there for their own."

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

480 comments

Don't concern yourself with this crap... (4, Insightful)

garcia (6573) | more than 9 years ago | (#10790801)

Has anyone out there seen similar behavior on their own sites? Please comment with your qualitative/objective data if so.

Sure, I see crawlers on my site all the time sometimes hitting the same URL over and over again. Do I understand their repetitive behavior? No. Do I care what they are doing? No, as long as they are obeying my robots.txt.

I have complained before about MSNbot ignoring changes to robots.txt while Google happily changed its habbits (I can't find the link sorry). My recent fighting with Googlebot has come to a head when I had to disallow them access to my gallery completely because they refused to honor anything except Disallow: /. I had to go so far as to point Googlebot at my robots.txt and tell it to remove all the previous links. It was rather annoying dealing with support via email from Googlebot as they have apparently taken on the stance of "we don't care but you should put meta tags in all your files so that we don't index those pages." Umm, you are crawling MY site for YOUR profit, you do as I say, not the other way around.

Do I care if MSNbot is crawling Google and then finding sites and links to search? No as it's none of OUR concern. What is OUR concern is our own robots.txt and how the spiders interact with our sites through that file. Let Google deal with Microsoft/MSNbot if that's what needs to be done but don't concern yourself with it otherwise.

More lies from garcia (0)

Anonymous Coward | more than 9 years ago | (#10790883)

Google nor MSN "profit" from crawling your site, garcia.

Re:More lies from cowardly trolls (0)

Anonymous Coward | more than 9 years ago | (#10790906)

Yes, they most certainly do profit from the data they have amassed. If they didn't spider sites they wouldn't be visited by the public who wouldn't see their targeted ads.

Thus, they profit from my data.

Re:More lies from garcia (2, Insightful)

calibanDNS (32250) | more than 9 years ago | (#10790946)

Actually, search engines profit from ad revenue displayed on search result pages (amoung other things). The search engine with the best results SHOULD attract the most users. Increasing the number of users can correlate to increasing profits from ads. Thus, search engine sites profit from having THEIR 'bots crawl YOUR site. On the flip side, we as web users, profit (non-monetarily) by having a better search engine.

Google is Catholic? (5, Funny)

TheAmazingBob (801587) | more than 9 years ago | (#10790898)

"Google happily changed its habbits..."

Google is Catholic?

LOL (0)

Anonymous Coward | more than 9 years ago | (#10790915)

i dont get it....

Re:LOL (1, Informative)

Anonymous Coward | more than 9 years ago | (#10790960)

Habbit = What a priest wears
Habit = A regular behavior for a person/thing

Re:LOL (-1, Troll)

Anonymous Coward | more than 9 years ago | (#10791051)

Not quite dumbass. Both meanings are spelled "habit".


No entry found for habbit.
Did you mean habit?


FUCK YOU SLASHDOT! You can never ban me! Ever heard of a fucking proxy!? Fuck all y'all.

Re:Don't concern yourself with this crap... (4, Insightful)

finkployd (12902) | more than 9 years ago | (#10790921)

Umm, you are crawling MY site for YOUR profit, you do as I say, not the other way around.

No offense dude, but you are the one who put the site out their publically. Now if they are DoSing you then you have a valid complaint but robots.txt is just there as a friendly suggestion. I can write a search bot today that completely ignores it and there is nothing wrong with that (except perhaps ethically but even that is arguable) If you don't want people (or bots) viewing it then password protect it or take it off the public interweb.

Re:Don't concern yourself with this crap... (2, Insightful)

garcia (6573) | more than 9 years ago | (#10790943)

Now if they are DoSing you then you have a valid complaint but robots.txt is just there as a friendly suggestion.

Crawling a gallery of images (and all image property links as well) all day for several days might be considered "DoSing" I consider it being rude.

You're right, they don't have to obey the robots.txt but they should when they say they will.

Re:Don't concern yourself with this crap... (1, Interesting)

Anonymous Coward | more than 9 years ago | (#10790958)

This is insightful? If your stuff is on the net, you should not expect it to remain private. So their bot is crawling your site. Get over it. If you don't want them crawling your stuff for profit, protect the directory or just ban them. Or just put meta tags in your pages like they said.

The bot should be treated as no different from another anonymous human. If not the Googlebot, one of the other search engines is bound to find it.

Mod parent up! (0)

Anonymous Coward | more than 9 years ago | (#10791057)

Someone else can see thru garcia's whining! Hey garcia: It's the internet. Either manually block the bots or STFU.

Difficult to do if Google doesn't want them to (5, Insightful)

Anonymous Coward | more than 9 years ago | (#10790807)

All Google has to do is run some unusual queries through MSN, check their logs, find the IP addresses and block them.

Re:Difficult to do if Google doesn't want them to (4, Funny)

carpe_noctem (457178) | more than 9 years ago | (#10790875)

Why stop there? Google should just ban all of Microsoft's netblocks to prevent their employees from gathering useful information from them...

"Begun, this war of the corporations has!"

Re:Difficult to do if Google doesn't want them to (2, Funny)

Anonymous Coward | more than 9 years ago | (#10790900)

Microsoft could create a new distributed crawler that comes bundled with Windows! Every Windows user could crawl Google for them, and then Google's only option would be to block everyone using an MS product.

Remember, helping Microsoft is like helping yourself.

Re:Difficult to do if Google doesn't want them to (0, Troll)

superpulpsicle (533373) | more than 9 years ago | (#10791065)

Hey it's not easy to block any gorilla, nevermind a trillion dollar one. Though again, Google should just block the word "windows" and "microsoft" at the javascript level on the main page.

customer-provided IP addresses. (1)

morcheeba (260908) | more than 9 years ago | (#10790967)

Microsoft could always have the google queries come from the user's computer, and integrate the results on the user's computer before displaying it. This would be impossible to block with IP address, but may be blockable with some sort of query heuristic. I'd think this could be done with Java or ActiveX pretty easily (I'm more of an embedded programmer...)

Re:customer-provided IP addresses. (0)

Anonymous Coward | more than 9 years ago | (#10791000)

But they're not using Java (at least in Firefox) and I don't imagine they're using ActiveX in IE.

ISRAEL POISONED ARAFAT (-1, Troll)

Anonymous Coward | more than 9 years ago | (#10790808)

Well looks like this mothercukginf bitch jjust got FIRST POST HOMIES YEAH YEAH TRIPLE P

Re:ISRAEL POISONED ARAFAT (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#10790843)

actually, no, you did not get FP. garcia [slashdot.org] did, as usual.

GARCIA OWNZ TR0LL4G3!

You don't say! (1, Funny)

xerocube (815443) | more than 9 years ago | (#10790821)

You mean M$ is searching through somebody else's stuff? Well... I'll be damned...

Re:You don't say! (-1, Troll)

Anonymous Coward | more than 9 years ago | (#10790885)

Sort of like how Linus searched through SCO property.

Re:You don't say! (1, Funny)

Anonymous Coward | more than 9 years ago | (#10790969)

They're not searching Google's porn links fast enough.

Does it violate Google's Terms of Service (4, Insightful)

winkydink (650484) | more than 9 years ago | (#10790824)

If so, they have legal remedies.

If not, it's called doing business and gaining an advantage any legitimate way that you can.

I think the interesting bit is in the conclusion. If MS is using this to establish a baseline, they can benchmark their spider against Google's over time.

Re:Does it violate Google's Terms of Service (3, Insightful)

Lev13than (581686) | more than 9 years ago | (#10790905)

Does it violate Google's Terms of Service? If so, they have legal remedies.
If not, it's called doing business and gaining an advantage any legitimate way that you can.
I think the interesting bit is in the conclusion. If MS is using this to establish a baseline, they can benchmark their spider against Google's over time.


If I copy your work and take credit or it, does it violate your terms of service? If so, you have legal remedies. If not, it's called doing business and gaining an advantage any legitimate way that I can.

Furthermore, I think the interesting bit is in the conclusion. If MS is using this to establish a baseline, they can benchmark their spider against Google's over time.

Re:Does it violate Google's Terms of Service (1)

winkydink (650484) | more than 9 years ago | (#10790965)

In the case os a listing of pages on the internet, my guess is that it would be considered akin to the data in the phone book, which was recently ruled not subject to protection by copyright.

But, I am not a judge. Or a lawyer. And I expect that if Google litigated here, they would be setting precedent.

uhh, law DOES matter (1)

tacokill (531275) | more than 9 years ago | (#10790997)

If you copy his work without permission, you've already committed copyright infringement -- so yes, you violate the TOS by default.

Comparing this to the MS/Google situation is not the same so the grandparent post still stands.

First Post (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#10790828)

I'm feeling lucky!

Just goes to show (-1, Flamebait)

Anonymous Coward | more than 9 years ago | (#10790829)

Niggers will steal anything.

Yea, and (5, Funny)

BrianGa (536442) | more than 9 years ago | (#10790830)

The new search engine's name will be Mooglesoft.

Spork or foon? (1, Offtopic)

3770 (560838) | more than 9 years ago | (#10790899)


So, what name do you favor for the combined fork and spoon utensil?

Spork or foon?

Re:Yea, and (4, Funny)

MooseByte (751829) | more than 9 years ago | (#10790927)


"The new search engine's name will be Mooglesoft."

Which will subsequently be sued by SCOogle, the latest startup from The Canopy Group, after announcing they purchased the rights to the Internet in a complex transaction which is documented in a briefcase somewhere in Germany.

Re:Yea, and (3, Funny)

meabolex (788745) | more than 9 years ago | (#10790953)

Initiating a Mooglesoft search:

Instead of clicking a button named Google Search, it simply says "KupoKupo!"

You are then returned a page where 100% of the text is the word "Kupo"

This is slightly less optimized than a Marklar search (which at least has some words other than 'Marklar').

Put down the PS2 controller (-1, Flamebait)

Anonymous Coward | more than 9 years ago | (#10791049)

And go get a life you fucking loser.

People like you should be put to sleep.

What do you expect? (1)

r2q2 (50527) | more than 9 years ago | (#10790838)

Really, since the google search results are public knowledge why wouldn't microsoft crawl google's stuff? If msn search can crawl the web why should it limit itself to everything except google/yahoo? Although this tactic may work the importation of all of googles massive search database might take awhile.

i (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#10790839)

post first

But will this mean Google can crawl back? (5, Funny)

biffnix (174407) | more than 9 years ago | (#10790840)

Couldn't Google just crawl Microsoft in return? Then they'd be stuck in an endless loop, and William Shatner can then swoop in, crack some skulls, and save the day.

Or something like that.

biffnix

Why not? (1, Insightful)

Anonymous Coward | more than 9 years ago | (#10790841)

Doesn't that mean even more results?

I'd do the same thing if I could. This is all "speculation" anyway, but since it feeds the stereotype of the insidious Microsoft, it gets posted front page to this "tech news" site.

Microsoft stealing someone elses technology??? (4, Funny)

Shant3030 (414048) | more than 9 years ago | (#10790842)

Nah, never happens....

Re:Microsoft stealing someone elses technology??? (1)

Picard102 (803951) | more than 9 years ago | (#10791018)

I fail to see how they are stealing any of Google's technology. Data maybe.

Re:Microsoft stealing someone elses technology??? (3, Interesting)

isometrick (817436) | more than 9 years ago | (#10791058)

Google's "data" is collected, generated, and stored by their technology.

I won't steal your oven, but I'll steal your food!

Legallity (0)

Anonymous Coward | more than 9 years ago | (#10790847)

Surely this is just as illegal as going through someone else trash, it is still not your property...

I guess that is the point

Nothing new... (1)

Moth7 (699815) | more than 9 years ago | (#10790914)

It was how Mr Gates learnt to code in the first place ;-)

Re:Nothing new... (0)

Anonymous Coward | more than 9 years ago | (#10791001)

And you must have LEARNED how to spell by stealing from mexicans, you fucking retard.

Re:Legallity (1)

La Camiseta (59684) | more than 9 years ago | (#10790978)

I wouldn't go that far. As soon as you put your trash on the curb, it becomes public property, and anyone can go through it. It's when they go through trash and it's in your side yard that it's illegal.

That's why it's technically illegal to go dumpster diving in dumpsters that are enclosed in those little brick cubes behind buildings. Although I've never really had a problem with them while dumpster diving. They can sure as hell, and probably would, get you for dumping your trash there.

They been crawling like mad lately (5, Interesting)

mpost4 (115369) | more than 9 years ago | (#10790852)

I can say that they been crawling like mad as of late, Google, Yahoo, and MSN. I say this because on my site I have had a lot of traffic from all three, and my site is not a popular, or even an important one but I seen a lot of traffic from them. Not just once a week or a few times a week but every day. There are big updates coming. I was not surprised to see the article about google doubling their index, I know something was coming from the way they are crawling unimportant/unpopular sites.

As long as it's legal (1)

arbi (704462) | more than 9 years ago | (#10790854)

As long as it's legal and helps Microsoft, I highly doubt that Microsoft would be concerned about the ethics of doing such a thing. The author is probably right.

Re:As long as it's legal (0)

Anonymous Coward | more than 9 years ago | (#10790876)

It's not legal. It violates Google's EULA (no automated crawling), and we know MS loves their EULAs.

Re:As long as it's legal (1)

Usquebaugh (230216) | more than 9 years ago | (#10791061)

Occams razor

As long as it helps Microsoft, I highly doubt that Microsoft would be concerned about the ethics of doing such a thing.

This guy is a dumbass... (-1, Troll)

Anonymous Coward | more than 9 years ago | (#10790858)

How did this make front page?
Holy shit our editors suck. This guy is fucking dumb.

Not a very effective tactic (1)

mg2 (823681) | more than 9 years ago | (#10790859)

I think Google could have some fun if MS was indeed just screen scraping... I don't think it would be too far fetched to alter results for a certain Microsoft-operated IP.

Try this term on MSN search (5, Funny)

bbzzdd (769894) | more than 9 years ago | (#10790860)

more evil than satan [msn.com]

ROOFLES!

Re:Try this term on MSN search (5, Funny)

JohnnyKlunk (568221) | more than 9 years ago | (#10790949)

OK. This is really freaky. Try

more evil than god [msn.com] and you get FIREFOX as the first result (then google, of course)

Re:Try this term on MSN search (1)

Anonymous Coward | more than 9 years ago | (#10790989)

can someone explain this???

Re:Try this term on MSN search (2, Funny)

stratjakt (596332) | more than 9 years ago | (#10791090)

Realize it takes into account popularity of the site, and occurence of the words, and I believe thw word types are ranked too, nouns before verbs before adjectives before adverbs.

The Firefox page is fairly popular, and the words "more" and "than" appear over and over, as with Google. (Uh, googles motto "do no evil" wouldn't hit another word, hmmmmmmmmm)

Try this one (seriously): more gay than slashdot

Re:Try this term on MSN search (2, Informative)

fireshipjohn (20951) | more than 9 years ago | (#10790998)

Now try it on google and you get articles about the 'more evil that....' debate.

I know which search engine I'm sticking with :)

Re:Try this term on MSN search (4, Funny)

finkployd (12902) | more than 9 years ago | (#10790966)

That they put google up there as the number one search result is not that surprising. What gets me is they have themselves at number four.

Re:Try this term on MSN search (2, Interesting)

Kalak451 (54994) | more than 9 years ago | (#10791011)

Also note that the "SPONSORED SITES" part of the page goes away on that search.

Re:Try this term on MSN search (2, Funny)

}InFuZeD{ (52430) | more than 9 years ago | (#10791050)

I'm not sure if it's funnier that Google is #1, or that Microsoft lists itself as #4.

Re:Try this term on MSN search (1)

Picard102 (803951) | more than 9 years ago | (#10791056)

http://www.cnn.com/TECH/computing/9911/15/search.e ngine.ms.idg/ Google has played a simmilar game.

They wouldn't... (4, Funny)

Wrathie (668211) | more than 9 years ago | (#10790863)

Such trouble. Just buy the damned company.

Re:They wouldn't... (4, Funny)

RobertB-DC (622190) | more than 9 years ago | (#10790924)

Such trouble. Just buy the damned company.

Come on, be serious. Google doesn't plan to buy Microsoft until *after* they reach the one-year post-IPO mark, silly.

Re:They wouldn't... (1)

stinkyfingers (588428) | more than 9 years ago | (#10791024)

Why buy the company when they could just steal all the companies "sssets", then put them out of business.

Think ruthlessly.

Re:They wouldn't... (1)

jessecurry (820286) | more than 9 years ago | (#10791079)

Why doesn't M$ just use google's search technology, it's highly doubtful that M$ will ever be able to create a better search engine than google(not that google is perfect, but M$ will probably try to add some strange integration with Windows openning up a huge security flaw and crippling home PCs for years to come).

Shocked I tell you (5, Funny)

finkployd (12902) | more than 9 years ago | (#10790869)

Well, that kind of business practice would be completely out of character for Microsoft.

This is a non-story. A good Slashdot headline will be when they get caught actually NOT doing something like this.

Microsoft Has Original Idea and Implements it By Themselves
From the 70%-of-slashdot-editors-suffered-heart-attacks -reading-this-submission Dept.

Re:Shocked I tell you (2, Funny)

oGMo (379) | more than 9 years ago | (#10791006)

Microsoft releases "Bob"
From the laugh-it's-funny Dept.

MSN and Google (1, Troll)

stratjakt (596332) | more than 9 years ago | (#10790888)

Can both crawl up my ass.

And who cares what Jamie Crowell (or whoever), random blogger, thinks MSN might be doing, no doubt based purely on "ms sucks" rhetoric?

Re:MSN and Google (1)

rpdillon (715137) | more than 9 years ago | (#10791077)

Hmm, did you actually *read* the article?

If URLs on your site are old (i.e. 404s) and are only indexed in Google, and yet you find MSN crawling them, only to find that their index is updated with those results shortly thereafter, well, that qualifies as something more than "'ms sucks' rhetoric". "Who cares?" might be a more appropriate retort.

Bloggers are just people. So are reporters. Just because some dude said it in a blog doesn't make it unreliable, any more than a journalist saying it makes it reliable.

If this were true... (1)

barcodez (580516) | more than 9 years ago | (#10790903)

Look I dislike M$ as much as the next guy, but if this were true then it would become immediately obvious to Google as they would be receiving a huge number of page requests from Microsoft. It would become even more obvious because they would be of the form

site:example.com


Doing this for say 100,000 domains would be noticable but would not even scape the surface of what's on the web.

Meta-search? (3, Interesting)

grasshoppa (657393) | more than 9 years ago | (#10790904)

The question is why? If they are doing this, are they simply going to present the results as their own, or are they going to work some magic and find the most relevant search results from ALL the engines and use those.

In the first case, it's a slimy business practice. In the second, it's fairly cunning ( and has been tried before ).

In either case, I doubt google is in any real danger. They are to search engines what MS is to the desktop. And while MS has squandered that advantage in the desktop arena ( reader homework: 250 word essay as to why ), google is only improving on their work.

Block? (1)

worm eater (697149) | more than 9 years ago | (#10790912)

Why can't Google just block MS from crawling their site? Wouldn't Google notice if other spiders were crawling them?

Firefox rendering (1)

xPhoenix (531848) | more than 9 years ago | (#10790933)

Maybe it's just me, but this beta search engine page renders better in Firefox than in IE. What browser are MS's devs using for their testing?

Re:Firefox rendering (1)

JustNiz (692889) | more than 9 years ago | (#10790991)

I find nearly all pages render better in Firefox than IE. Especially with adblock installed :-)

Msn Crawling (3, Informative)

clinko (232501) | more than 9 years ago | (#10790936)

If you've been watching the logs to your site lately Microsoft has been RAPING most servers. Most crawlers will pick through pages with large lists 1 at a time, then come back every hour or so.

MSN starting last week has been pulling EVERY LINK in sequence from my site. Even the larger Artist Index pages [clinko.com] of my site.

Seriously, I've had this same spider on my site for about 36 hours now.

Violates Google's TOS (5, Informative)

Anonymous Coward | more than 9 years ago | (#10790937)

From Google's Terms of Service [google.com]
Personal Use Only

The Google Services are made available for your personal, non-commercial use only. You may not use the Google Services to sell a product or service, or to increase traffic to your Web site for commercial reasons, such as advertising sales. You may not take the results from a Google search and reformat and display them, or mirror the Google home page or results pages on your Web site. You may not "meta-search" Google. If you want to make commercial use of the Google Services, you must enter into an agreement with Google to do so in advance. Please contact us for more information.

IP (1)

OxygenPenguin (785248) | more than 9 years ago | (#10790940)

Next you thing you know, MS will be suing Google for IP rights to their cache. That, or buying the cache from them for $50k and then sucking to IBM.

Just speculation right now... (1)

HermesHuang (606596) | more than 9 years ago | (#10790973)

After reading the article all of this is based on one result and a bit of speculation. However, if true, I would hope Google quickly finds a way to block this.

What would be funny is if Google could detect when it is Microsoft sending a query through their system and return random results. Or return 5000 results all of which are redirects back to the MSN search page. And of course, Microsoft can't complain about such a thing because in doing so they'd admit they're trying to use Google's results.

I wonder how long some of the less intellegent MSN users would spend at the search page clicking on links that redirected back to the MSN search page?

Absurd (4, Insightful)

targo (409974) | more than 9 years ago | (#10790984)

The claims are so absurd I don't even know where to start.
1) His whole theory is based on the "fact" that the only way in the world to find his pages is to use site:www.sitename.com in Google, implying that Google has cached the results from an earlier crawl. Of course, there is no way that the Microsoft search couldn't have also cached it.
2) Then, he claims that Microsoft is probably screen-scraping Google's results (for all the millions of sites out there), and using these results to recrawl those sites? This doesn't even make any sense.
3) And last but not least, Microsoft is certainly basing its whole search architecture on the assumption that Google wouldn't ever notice MSN mirroring its whole index. Yeah right.

Could this be the end? (1)

beaststwo (806402) | more than 9 years ago | (#10791004)

Could this be the end of only unique content on each Web page on the Internet? We've had to suffer all these years with no duplication of content and not a single case of recursive linking between web pages.

It's almost refreshing to see that the Internet may well be catching up to television...Media maturity at last!

Another thing to consider is ... (0)

Anonymous Coward | more than 9 years ago | (#10791010)

... the MSN search beta that I saw stole everything from Google anyways.

The user interface originally looked like Google. They clustered commodity PCs in the same 'shard' configuration as Google. Their ranking algorithms considered links like Google.

They have done nothing innovative, and they are continuing to chase taillights. Let's hope they don't catch up.

Probably Not.. (2, Interesting)

DelawareBoy (757170) | more than 9 years ago | (#10791012)

My website is the #1 site listed with specific Criteria on Google. Consistently for the last 2 months. I try the same thing with MSN search and My site does not even show up at all.

If they are searching Google, they haven't done it recently, or else they haven't gotten to my site yet.

Spike the results, then sue (4, Informative)

G4from128k (686170) | more than 9 years ago | (#10791014)

It would be easy for Google to insert a small fraction of non-sequiturs in the results, look at Microsoft's search results, and then sue for misuse. Even if MSFT uses random proxies to avoid detection, it cannot manually recheck all the hits to make sure they are correct (if they could, they had the resources to check all the sites, then they not need to crawl Google. A few made-up sites or inappropriate search hits would be enough to establish a pattern of abuse.

Limit (1)

rattler14 (459782) | more than 9 years ago | (#10791025)

I might be mistaken, but I thought google has a 10,000 query limit per IP address per day. So it might be conceivable that enough computers over several days could get it, though I imagine it wouldn't be trivial

I think this is mentioned in Google Hacks by O'Reilly. Those with an online account there can check it out and mock me if I'm wrong :)

try this (0)

Anonymous Coward | more than 9 years ago | (#10791036)

search google for site:google.com

Yep. (1)

Skiron (735617) | more than 9 years ago | (#10791047)

I see bots hitting a cgi test set-up forum I ran 2 years ago (before uploading to remote ISP) STILL try to index pages. I think the bloke is spot on with his analysis.

They really only need to seed their crawler... (5, Interesting)

JustNiz (692889) | more than 9 years ago | (#10791052)

You can't get to every page on the internet just by starting at one page and recursively following links, therefore the more places you from, the more likely you are to have 100% coverage.

I could imagine that Microsoft just needs a few thousand URL's evenly-spread across the internet just to seed their crawler, which they can get from Google by using a list of most popular queries.

Once their crawler has so many starting points it can do the rest itself.

Hello!!... (1)

Eggplant62 (120514) | more than 9 years ago | (#10791085)

It's called a router. It can be set to null route whole chunks of IP address space. Set it to forget where Microsoft is and forget it.
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...