Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

MS Research Automates Search Engine Spam Hunt

Zonk posted more than 8 years ago | from the i'm-all-for-less-junk dept.

68

Barbie Dollar writes "Researchers at Microsoft are working on an ambitious new project to hunt down and neutralize large-scale search engine spammers. The project, called Strider Search Defender, automates the discovery of search spammers through non-content analysis. The project integrates technology from two previous Microsoft Research prototypes (Strider HoneyMonkey and Strider URL Tracer) and promises a new approach to removing junk results from search engine queries."

cancel ×

68 comments

Sorry! There are no comments related to the filter you selected.

This just in..... (4, Funny)

Mayhem178 (920970) | more than 8 years ago | (#15714308)

Every anti-Microsoft blog and article in existence has been flagged as search engine spam.

More at 11.

I don't believe it (-1, Offtopic)

Anonymous Coward | more than 8 years ago | (#15714481)

War is unfolding spreading across the Middle East. Israel is fighting a two-front war, while Iraq is in open insurrection.

The end times may be near, but all that you guys can think about is a post by "Barbie Dollar" about search engine optimization.

If any of us are still alive in ten years, none of the survivors will care about how to spam a search engine, or even what a search engine was. Get some priorities, will you?

Re:I don't believe it (1)

cartel (845256) | more than 8 years ago | (#15714753)

Why are you even spending time reading and posting on Slashdot at all if you're so worried about the middle east and the end times being near, hmmm??

Re:This just in..... (1)

LFTr (984547) | more than 8 years ago | (#15716865)

If they want to get rid of search engine spam, they should start with MSN Spaces. Try this: http://search.msn.com/results.aspx?q=viagra%20site %3Aspaces.msn.com [msn.com] or this: http://search.msn.com/results.aspx?q=cheap+hotels+ site%3Aspaces.msn.com [msn.com] or this: http://search.msn.com/results.aspx?q=porn+site%3As paces.msn.com [msn.com]

In Redmond, Washington (-1, Offtopic)

Anonymous Coward | more than 8 years ago | (#15714316)

The engine spams YOU.

Still seems reactive (2, Insightful)

MrNougat (927651) | more than 8 years ago | (#15714319)

Sure, preventing search engines from indexing blogspam posts is great. Maybe that's the first step, but it's not going for the root cause - the botnets that run the apps that post/email in the first place, and the compromised webservers hosting order sites.

It's not an either-or situation (4, Insightful)

ScentCone (795499) | more than 8 years ago | (#15714382)

Sure, preventing search engines from indexing blogspam posts is great. Maybe that's the first step, but it's not going for the root cause - the botnets that run the apps that post/email in the first place, and the compromised webservers hosting order sites.

These are not mutually exclusive goals. If you take away any incentive for spamalizing content (meaning, not only does it not boost your search placement, it penalizes you), then much of the pressure to run botnets and crack servers goes away.

Re:It's not an either-or situation (1)

mythosaz (572040) | more than 8 years ago | (#15714975)

It's an evil cycle.

Much like our spam emails have adapted and (mostly) overcome spam filters, link-farm search-hogs will adapt too.

As much as we'd like to remove the root cause, nobody's going to fix "greed" anytime soon.

In the meantime, like spam, we can make it more difficult for them to do business.

Re:It's not an either-or situation (0)

Anonymous Coward | more than 8 years ago | (#15715415)

not really. Before there was money in botnets there were bragging rights. That will always be there...

Re:It's not an either-or situation (1)

ScentCone (795499) | more than 8 years ago | (#15715584)

Before there was money in botnets there were bragging rights. That will always be there...

Bragging rights didn't include 10,000 more traffic trying to sell fake drugs and bogus Rolex watches and pay-per-click on pr0n ads. It's just not the same, scale-wise. People who own 500 domains that are all stuffed with search spam content aren't in the bragging rights game, I think. But of course, more and better security/practices is worth it no matter what, and not just on MS's part.

Ultimately helps AdWords and Google...maybe (1)

us7892 (655683) | more than 8 years ago | (#15714337)

Microsoft, by cracking down, could effectively decrease the spam sites, the results would be fewer AdWords and microAds displayed and clicked, and could lower revenue for Google and Yahoo.

A side effect is better search results, which would increase use of Google again. Where is MSN Search in all of this...I don't know. But fewer of those crap sites, the better.

Re:Ultimately helps AdWords and Google...maybe (0)

Anonymous Coward | more than 8 years ago | (#15714419)

A side effect is better search results, which would increase use of Google again.


You think that this approach by MS to improve their search results is going to affect Google how...?

Re:Ultimately helps AdWords and Google...maybe (0)

Anonymous Coward | more than 8 years ago | (#15714443)

Well, the post did say "project to hunt down and neutralize large-scale search engine spammers". Hunt down and neutralize, find and destroy, locate and kill. Not simply "make our MSN search results better."

Re:Ultimately helps AdWords and Google...maybe (1)

pimpimpim (811140) | more than 8 years ago | (#15715435)

They don't seem to remove the actual sites, just the entries in the search results. In this way, it helps only MS search, and rightly so, as google seems to be sleeping on this problem already for way too long.

While google was developing online applications we weren't really waiting for, Microsoft correctly found the main spot of irritation in search results, and if they will manage to automatically remove those and provide the search results people want (not just the sponsored shit msn search has shown in the past), then they might prevail over google.

I bold-faced "automatically" because google's efforts so far seemed mostly manual actions [slashdot.org] at the wrong spots, which do make only little sense to clean out the mass amount of spam sites. In the case I linked to, BMW more or less has a legitimate goal to make people looking for BMW find their site, and the action of google was just plain stupid. If apparently people can shortcut the google search results, than google is the one making the failure, not the one misusing it. Hopefully google will wake up now, so we get at least 2 efforts to automatically clean out spam sites of the search results.

Will they share? (0, Redundant)

neonprimetime (528653) | more than 8 years ago | (#15714341)

Researchers at Microsoft are working on an ambitious new project to hunt down and neutralize large-scale search engine spammers.

So, if by some miracle, they actually discover a way to hunt down and nuetralize the search engine spammers, what are the odds that they share this information with other Search Engine companies?

Re:Will they share? (1)

dedazo (737510) | more than 8 years ago | (#15714366)

I don't know how this is relevant. Would you expect Google to share something like this with Microsoft? When was the last time you saw Google or Overture sharing propietary search algorithms with their competitors?

Re:Will they share? (1)

neonprimetime (528653) | more than 8 years ago | (#15714436)

But spam in general is a global problem (email and search engines), and their are global groups that have been put together to combat it. The government is trying to get involved too. You would think, if somebody comes up with a way to crush spam, they won't be allowed to keep that info to themselves long, because the global community wants that information. Maybe I'm just crazy, who knows.

Re:Will they share? (1)

idesofmarch (730937) | more than 8 years ago | (#15714454)

they won't be allowed to keep that info to themselves long

What do you mean by "allowed"? What legal means are available to this global community you speak of that would allow it to take by force something that is a trade secret?

Re:Will they share? (0)

Anonymous Coward | more than 8 years ago | (#15714623)

A secret government agency will force their way into the Microsoft campus at around 3:15am some early morning and confiscate all the Linux boxes (because that's where Micrsoft stores their important information).

Re:Will they share? (1, Funny)

Anonymous Coward | more than 8 years ago | (#15714668)

I find your ideas intriguing and would like to subscribe to your newsletter.

Re:Will they share? (1)

AigariusDebian (721386) | more than 8 years ago | (#15715095)

Saying that spam is a global problem implies that someone must step in and solve that for everyone, but that is not how market economy works. If a company can make spam irrelevant to its customers, then it is great for the customers and thus to the company that managed to do that.
In fact I have read the research that this is based on and must say that there is absolutely nothing new or innovative - just a lot of number crunching trying to solve a complex problem in the most direct way possible - by throwing millions in computing power at it.
I am also in that research area and I think that direction is a dead end - non-content information can be faked much cheaper computationally then this faking can be detected. Faking a real web site with consistent content and not too much advertisement is much much harder.

Re:Will they share? (3, Insightful)

idesofmarch (730937) | more than 8 years ago | (#15714392)

First, do not be so skeptical. Have you noticed how well Outlook 2003 spam filtering works? I realize the algorithm is different, but based on results, I have to say that it is probable that Microsoft will succeed with reasonable effectiveness.

Second, what business rationale is there to give away a competitive advantage (after spending millions to get it) in the very competitive search market, where, by the way, Microsoft is not the market leader?

Re:Will they share? (1)

shaneh0 (624603) | more than 8 years ago | (#15714607)

That's just typical Microsoft.

Google creates novel applications like MapReduce and GFS all the time. And, as usual, Microsoft is right there to incorporate the best ideas from Google and Yahoo into their search product. If only we could get Microsoft to embrace Open Source like Google has. .....What's that? Googles search software is proprietary? MSN DOESNT RUN GFS? Map Reduce ISNT on source forge?

Those Bastards!

Who are they to spend millions of their own money to hire the best minds in the business and then not just GIVE IT ALL AWAY. You'd think they were actually tring to turn a PROFIT or something!

Re:Will they share? (1)

Amouth (879122) | more than 8 years ago | (#15714877)

all i want google to open up is their web server...

Re:Will they share? (1)

AigariusDebian (721386) | more than 8 years ago | (#15714941)

Actually Map and Reduce are basic concepts of ant functional laguage, but the way they were intergrated in a huge and fully automated job control system required some major vision. And a bunch of code.

Why should they? You shouldn't want them to. (2, Insightful)

ScentCone (795499) | more than 8 years ago | (#15714405)

So, if by some miracle, they actually discover a way to hunt down and nuetralize the search engine spammers, what are the odds that they share this information with other Search Engine companies?

Their purpose is to make their own search engine more effective for users, thus generating more traffic for them. A nice side effect would be that Yahoo and Google, etc., would feel more pressure to integrate similar technologies into their own engines. As usual, competition produces the best results.

Re:Why should they? You shouldn't want them to. (1)

rts008 (812749) | more than 8 years ago | (#15715124)

That is my take on this also, wish I currently had mod points, if I did I would not be replying, but giving you +1 insightful instead- sorry I can't do better!

Re:Will they share? (1)

Zabu (589690) | more than 8 years ago | (#15714407)

Sharing is what you learn in kindergarten
 
This is industry, you don't share, you capitalize

Re:Will they share? (1)

CaymanIslandCarpedie (868408) | more than 8 years ago | (#15714452)

what are the odds that they share this information with other Search Engine companies?

Probably about the same odds as Google sending Yahoo and MSN detailed specs of thier search algorithums or the 2008 Republican presidential candidate going out and campaigning for the Democratic candidate or the US shipping Iran a fully functional atomic weapon production facility ..... I could go on, but you probably get the idea. Sometimes competitors want to beat thier opponents you see.

Re:Will they share? (0)

Anonymous Coward | more than 8 years ago | (#15715686)

... or the Democrat candidate saying something intelligent...

Re:Will they share? (1)

Pengo (28814) | more than 8 years ago | (#15714467)


Long Live Competition!

This is how these markets are supposed to work. Let the smartest/best company with the best product find success and enjoy the fiscal rewards.

If MSN can out-do Google, I'd move my search traffic there in a heartbeat. Of course Google won't let that happen, WE THE -CONSUMER- WINS! This isn't communism, no reason that a company should have to give their competition their work if they put the effort into solving a problem/finding a solution.

Re:Will they share? (1)

krewemaynard (665044) | more than 8 years ago | (#15714841)

Google will probably come up with their own, better methods. Besides, MS wants to crush Google, so no, they won't share.

Not before time. (1, Interesting)

SatanicPuppy (611928) | more than 8 years ago | (#15714342)

I'm all for people being allowed to try and game the system...Anything else would restrict the whole purpose of the Internet as a repository for whatever the hell someone wants to put in there.

At the same time, I'm all for search engines blacklisting people who game the system, parked domains, crap aggredator pages, etc. It's all about building a better mousetrap.

Re:Not before time. (-1, Flamebait)

Anonymous Coward | more than 8 years ago | (#15714462)

Idiot.

But I thought.. (4, Funny)

Anonymous Coward | more than 8 years ago | (#15714344)

..that Strider HoneyMonkey was Arwen's pet name for Aragorn?

Strider Hiryu (0)

Anonymous Coward | more than 8 years ago | (#15714353)

My guess is the next project will be called Strider Hiryu and this will eliminate said spam.

Re:Strider Hiryu (1)

kahei (466208) | more than 8 years ago | (#15718965)


Not so, I'm afraid -- he will never leave Eurasia alive.

good? (0, Offtopic)

tomstdenis (446163) | more than 8 years ago | (#15714364)

search-spam sucks bad. I'm tired of doing searches and finding 100s of useless links and "secondary search pages" with nothing but ads and other junk [spyware/adware].

Tom

Cover-up (1)

Kesch (943326) | more than 8 years ago | (#15714371)

"Strider Search Defender" is just a cover name. It's really the "Aragorn Search Defender" it just likes to remain incognito so that spam-zombies don't think to hunt it down.

Re:Cover-up (2, Funny)

creimer (824291) | more than 8 years ago | (#15714414)

I was under the impression that the final name would be the "Half-Life 2 Search Defender", considering that's product will only a have a half-life of usability before the Microsoft patching system kicks in.

option 1 or option 2? (-1, Offtopic)

Anonymous Coward | more than 8 years ago | (#15714400)

ooh, ooh? will they be little funky spider-bots, or do we get the huge mecha-panther?

Re:option 1 or option 2? (0)

Anonymous Coward | more than 8 years ago | (#15715230)

you all fail at classic gaming.

Go Microsoft! (5, Insightful)

eebra82 (907996) | more than 8 years ago | (#15714408)

All major search engines have been doing this for quite some time. Google is probably the best hunter of them all and the most recent update, which occured on June 27, banned a large number of spammers who had billions of sites indexed. Unfortunately, the war on spam is quite difficult. They spammers are working with non-content pages but it is a matter of time before they start generating non-jibberish content to spam with, too.

Hopefully, Microsoft's approach will give some effect and push other operators to work harder on preventing the web spam.

Amusingly, you're most likely getting affected only if you're searching for penis pumps, pornographic content and gambling.

Re:Go Microsoft! (-1, Offtopic)

Anonymous Coward | more than 8 years ago | (#15714435)

...says someone with a link to a poker site in his sig.

Re:Go Microsoft! (1, Interesting)

Anonymous Coward | more than 8 years ago | (#15714871)

Amusingly, you're most likely getting affected only if you're searching for penis pumps, pornographic content and gambling.

And cracks, keygens, and warez.

Re:Go Microsoft! (1)

MillerAH (240692) | more than 8 years ago | (#15714902)

They spammers are working with non-content pages but it is a matter of time before they start generating non-jibberish content to spam with, too.
Like Cnet?

Re:Go Microsoft! (1)

Julian Morrison (5575) | more than 8 years ago | (#15715346)

It strikes me that the asymptote of this curve is "spammers" generating actual new, useful, interesting content to push their spam. In other words, the acme of un-blockable spam sites is an ad-supported nonspam site.

M$, Google and friends might actually drag them so far around illegitimacy they come back to legitimacy. Ironic, no?

Human Powered? (4, Interesting)

pembo13 (770295) | more than 8 years ago | (#15714439)

Seems to me that a group of 10 people could easily flag a large amount of spam websites. Is this currently being done by any major engine?

Re:Human Powered? (0)

Anonymous Coward | more than 8 years ago | (#15714744)

Yes, Google has started to do that. They have approximately 10,000 people in China who are working 24 hour shifts without pay.

Always nice to have the government on your side.

Dangerous. (1)

SanityInAnarchy (655584) | more than 8 years ago | (#15715670)

Arms race.

This is exactly what happens in email. You say "Oh! I can filter 99% of my spam by grabbing anything with 'Viagra' in the subject line!"

The spammers, noticing this, start using subject lines like "Urgent! Read now!"

You adjust your filter to watch for anything with "Urgent" in the subject line and "Viagra" in the body.

They send you Vi.ag.ra instead. You catch that, they send you Vlagra.

They send "Penis pills". You filter anything with "Penis". Then your freind changes their signature to "The Pen is Mightier than the Sword". Since your filter is smart enough to catch "Vi ag ra", it's also dumb enough to think "Pen is" means "Penis".

You adjust your filter to assign a score based on how many bad things it notices, and you add a few good things to even the score -- like whitelisting a few close friends, and anything coming in with "I AM NOT SPAM" in the subject line. Of course, you realize it won't work entirely -- the spammers will eventually use "I AM NOT SPAM", and sooner or later you'll get an email from someone you never heard of, who wants to talk to you about a business proposition, who got your email from somewhere like a forwarded message or somewhere else on the Internet, and they don't add the "I AM NOT SPAM" flag. But for awhile, it works.

Then the spammers start sending messages that contain no text at all, just a few large images.

You filter that, meaning you completely miss your grandmother's email -- family photos -- or your girlfriend's birthday surprise email -- you fill in the blanks.

Before you know it, you're spending all your spare time tweaking your spam filtering settings, and it's still not enough. You thought it would be so easy -- just a Perl one-liner used to block 99% of your spam, with 0 false positives! But things are changing too fast now. At some point, you get the genius idea to make it open source. Hundreds of like-minded people flock to it, desparate. Every day, your spamfilter downloads a new copy of the rules database, a collection of Perl one-liners used to catch spam. But you're getting hundreds of spams a day now, which means as soon as the spammers switch tactics, you could have a thousand spams in your inbox before you get the daily database update -- and that's assuming the daily update has a rule that blocks these.

Basically, you've created Spam Assassin [apache.org] . Works like an anti-virus program. It also means that someone has to get hit with a new virus (type of spam) before the filter can block it, but even when it's at its best, it's still nowhere near good enough. Remember, 95% accuracy on 500 spams a day means you still get 25 spams in your inbox.

This is why its best to automate this kind of thing. Use a statistical filter such as dspam [sourceforge.net] , bogofilter [bogofilter.org] , or crm114 [sourceforge.net] . They are actually more accurate, when trained by humans, than a hand-coded filter.

So yes, you do need humans to train your web filter, but you also need your humans to continue to train and retrain a statistical filter. You can't just pick an arbitrary five websites and either assume that's all there is, or remove everything like those five, because that just starts the exact same arms race I've just described.

Re:Dangerous. (1)

shird (566377) | more than 8 years ago | (#15717001)

The difference being you are not creating a filter, but flagging sites manually as spam sites.

This is different because it is more difficult to set up a web site and domain (and build links to get in the top search result pages) than shoot off an email. Thus you are flagging the sites themselves, not the particular 'trigger words'.

Do a search for 'buy mobile phones' or some such crap.. look at the top 10 results or so. If they are obviously spam sites then flag them, and their entire domain. Have regular google searchers flag sites as 'this site was spam' through the toolbar and have them manually reviewed.

When you start knocking out the sites in entire domains and whois info's at a time, and are getting rid of mostly the spam sites hogging the top ten sites in search results, I dont think it would take too long to clean it out. Those sites then have to start from scratch as their domain is knocked out of any page rank they may have got, and hence any new sites they create are buried deep in the search results and are no longer a problem.

Re:Dangerous. (1)

SanityInAnarchy (655584) | more than 8 years ago | (#15727095)

When you start knocking out the sites in entire domains and whois info's at a time, and are getting rid of mostly the spam sites hogging the top ten sites in search results, I dont think it would take too long to clean it out.

Are you sure? DNS is BIG, and I'm pretty sure you can automate buying domains -- they're pretty cheap, too. Also, remember that whois info can be faked, and often is (deliberately) by sites like GoDaddy to say that GoDaddy owns the domain, hiding the info of whoever really controls/pays for it.

And what do you do when all is said and done, and they simply use Tripod, Freewebs, and MySpace?

Re:Dangerous. (1)

shird (566377) | more than 8 years ago | (#15727149)

Well my other comment about PR still stands. When they buy new domains or move to different sites.. they drop from the search results, google sandboxes them etc. Google also takes into consideration age of domains etc. Building PR takes a long time.. and the spam sites cant just move around all the time and still get traffic.

Re:Human Powered? (1)

otie (915090) | more than 8 years ago | (#15717213)

And if any of those 10 people happens to have a personal grudge against someone or something...

Re:Human Powered? (1)

pembo13 (770295) | more than 8 years ago | (#15717438)

Well here's hoping good management, and possibly a "weekend review" process would be helpful. Monday to Friday, 8 - 5, the search using popular keywords (and misspellings) compile a list, look for similar IP, host, etc. At the end of the week, have a 2nd party veryify the reults, Monday morning put in the block, rince, repeat.

if it works (1, Flamebait)

BarryLoper (928015) | more than 8 years ago | (#15714447)

If they could make something like this work, it would be a big draw away from Google.

Of course, with their track record of Neat Ideas vs. Actual Products, (WinFS, etc.) I'm not holding my breath.

I am, however, wishing them luck.

Non-content based comment spam prevention (3, Informative)

IO ERROR (128968) | more than 8 years ago | (#15714469)

Microsoft forgot to mention my non-content based method of blocking comment spam entirely known as Bad Behavior [homelandstupidity.us] . And now that they seem to have swiped a few of my ideas, I'm going to have to go see what they're up to...

Re:Non-content based comment spam prevention (2, Insightful)

DarkWicked (988343) | more than 8 years ago | (#15715072)

I installed Bad Behavior a few months ago on a community website... for three days.
During three days I logged a lot of actual (and logged in) users being blocked... then I tried to spam my own site by using opera and a fake user-agent + elite proxy and I had no problem doing that...

So yes, I guess it has the qualities required to be a good microsoft product.

Good. (3, Interesting)

ExileOnHoth (53325) | more than 8 years ago | (#15714699)

This *must* be one of the next battle lines in the so-called search wars.

I remember the first time I saw google - I was blown away: "Wow. These results are exactly the web pages I was looking for!" But that's no longer the case when you search in google. They've really fallen behind in being able to separate out (or, as they say, "search for") the pages I want from the junk.

I hope google will win this war, but maybe microsoft chucking some money at the problem will help light a fire under google to get this fixed before someone else does it better. If searching at google no longer brings me relevant results better than any other source, I'm gonna start looking for somewhere else to search. Just like I did when I switched to google from yahoo back in the twentieth century.

and shut down? (0)

wardk (3037) | more than 8 years ago | (#15714751)

I seriously doubt MS is going to "shut down" every windows box on the planet. ...searching for spam, eh? oddly spam finds the rest of us, but at MS they have to "search" for it.

Re:and shut down? (1, Insightful)

Anonymous Coward | more than 8 years ago | (#15714897)

How the fuck did a post from somebody who clearly hasn't even read the ''summary'', let alone the article, get modded up "Insightful"? Mods, just because a poster has a low ID doesn't mean their posts are always worth reading.

For reference:
(a) What does shutting down Windows boxes have to do with searching for search-engine spam?
(b) How does search-engine spam "find" you?

Could it possibly be that you saw the word "spam", and your brain shut off while you wrote a nonsensical post that might just have made sense in the context of an article about email-spam zombie computers, but is totally irrelevant in the context of search-engine spam?

we can only hope (1)

Tibor the Hun (143056) | more than 8 years ago | (#15715354)

we can only hope that this research is as fruitful as their speech synthesis research, email spam blocking, multiplatform video codec, next-gen filesystem, advanced CLI shell, and portable computing.
yay for MS research!

Experimental... (1)

Tavor (845700) | more than 8 years ago | (#15715854)

So in other words, it'll be called Aragorn when it becomes master?

How about filtering domain kiters (1)

spion666 (922711) | more than 8 years ago | (#15716148)

Google could cut their spam to 1/4 if they stop accepting websites whose domains are less then 7 days old (Will render domain kiting useless)

Re:How about filtering domain kiters (1)

bogado (25959) | more than 8 years ago | (#15718764)

You do relise that a new domain takes exactly 7 days to be older then 7 days, or it takes X days to become X days old. If you put a random number of days, spamer will simply wait for this number of days before they will spam the site. In fact google already does this in a sense, younger sites have a lower page-rank then older similar sites.

I do agree that extremely new sites with weird domains names should be scrutinised before entering the engine.

obvious answer (0)

Anonymous Coward | more than 8 years ago | (#15716559)

I would pay to use a search engine that removed all "blogs" and shopping sites from the results.

Good stuff, but not a big deal in Release 1 (1)

CurtMonash (986884) | more than 8 years ago | (#15716560)

This addresses a particular kind of spam page that is promoted in a particular way.

But it does nothing to address the vast majority of the pages that contaminate search engine results. I'm referring to automatically generated pages that look like good pages and hence rank well in search engines, but really have little except links and perhaps some public domain info. E.g., there could be one each for every resort hotel in Mexico. The search engine result turns up a summary that makes it look like there are "reviews" there. But either the reviews section is empty, or else they reproduce something that's available on dozens of other sites as well. In one case, apparently, a single such site had 4 billion "different" pages. I'm not making that number up.

More sophisticated kinds of link-network analysis will be needed before those bite the dust.

hmm... (1)

bnitsua (72438) | more than 8 years ago | (#15717091)

non-content analaysis? isn't that patented by slashdot readers?
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>