WWW Surpasses One Billion Documents

Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

WWW Surpasses One Billion Documents 157

Posted by CmdrTaco on Tuesday January 18, 2000 @12:34PM from the thats-a-whole-lotta-data dept.

Gary William Flake writes "A new study by Inktomi and NEC Research Institute show that there is at least one billion unique indexable Web pages on the internet. The details are pretty interesting; for example, Apache dominates the server market. "

This discussion has been archived. No new comments can be posted.

WWW Surpasses One Billion Documents

Load All Comments

Search 157 Comments Log In/Create an Account

Comments Filter:

mcdonald's effect (Score:1)

by dayeight ( 21335 ) writes:

I can just see giant billboards now:

Apache: Millions and Millions Served
(For Free!)
Nice :) (Score:1)

by moeffju ( 114331 ) writes:

Not first post, but first with my threshold ;) Seriously, it's nice to see Apache sticking out again. Should do fairly well for marketing Linux.
In related news... (Score:2)

by Sick Boy ( 5293 ) writes:

approximately 7 of them are useful.
the best part (Score:4)

by Capt Dan ( 70955 ) writes: on Tuesday January 18, 2000 @07:42AM (#1361309) Homepage

Longest domain name:
http://www.tax.taxadvice.taxation.irs.taxservices. taxrepresentation.
taxpayerhelp.internalrevenueservice.audit.taxes.co m

gee. A tax site with a long, unintelligble, confusing domain name. Go figure.

"You want to kiss the sky? Better learn how to kneel." - U2

Share
twitter facebook
Billions Served (Score:1)

by Imortus ( 133449 ) writes:

Glorious gravy, the web has has breached 1,000,000,000 indexable pages. And just like radio, network television, cable and satellite before it, the new gag is:

A billion pages of information, and nothing's on.

Now, if real life exemplified the web, we'd know that 85% of the earth's population speaks English and, as can be expected, the IRS's domain name proves to be a lesson in redundancy and triplicate.
And at least one of them already comments on that (Score:4)

by dsplat ( 73054 ) writes: on Tuesday January 18, 2000 @07:43AM (#1361311)

Yes, and the Jargon File [tuxedo.org] already has a comment on that, originally from Theodore Sturgeon [tuxedo.org]:

Sturgeon's Law prov.

"Ninety percent of everything is crap". Derived from a quote by science fiction author Theodore Sturgeon, who once said, "Sure, 90% of science fiction is crud. That's because 90% of everything is crud." Oddly, when Sturgeon's Law is cited, the final word is almost invariably changed to `crap'. Compare Hanlon's Razor, Ninety-Ninety Rule. Though this maxim originated in SF fandom, most hackers recognize it and are all too aware of its truth.

Share
twitter facebook
Meaningless Statistics (Score:3)

by (void*) ( 113680 ) writes: on Tuesday January 18, 2000 @07:45AM (#1361312)

That's what I hate about such "statistics". No information or context is given. One is not told how this estimate of "one billion" is gotten. No details about how the research methodology was forthcoming. Instead one is only supposed to stare slack-jawed in amazing at the touted figure of one billion an be impressed. That anyone can impress oneself that THIS is an achievement is amazing.
For all you know - the web has surpassed at least 1 webpage count. Big Fscking Deal!!!

Share
twitter facebook
Heh... (Score:2)

by Anonymous Commando ( 6326 ) writes:

<DrEvil>One... billion pages</DrEvil>
Sorry - couldn't resist. :=]
________________________
Longest Domain Name (Score:1)

by pvthudson ( 100866 ) writes:

Check out the longest domain name: http://www.tax.taxadvice.taxation.irs.taxservices. taxrepresentation. taxpayerhelp.internalrevenueservice.audit.taxes.co m
Of course its about taxes, you got to hand it to the IRS, even their URLs are hard to read and understand. I wasn't able to open this link, can anybody else?
Why? (Score:3)

by dsplat ( 73054 ) writes: on Tuesday January 18, 2000 @07:47AM (#1361316)

Why is one of them Hamster Dance [hamsterdance.com]? Don't go there with an 18 month old child on your lap. For an adult, this is funny once. For a toddler, it is funny every time the computer is on.

Share
twitter facebook
Re:the best part (Score:1)

by TheOpus ( 21081 ) writes:

Capt Dan...

This is not and never was a real site, I doubt it very much. It was definately just used for Search Engine Spamming, nothing else.

- TheOpus
technically inacurate statistics (Score:3)

by TheCodeMaster ( 101307 ) writes: on Tuesday January 18, 2000 @07:49AM (#1361318)

dynamic content makes the technical quantity of distinct "pages" far greater than a billion.

Share
twitter facebook
All I want... (Score:1)

by thefatz ( 97467 ) writes:

is that big cluster of sun boxen. Now that is some serious admining and "play?". Just think of clustering them all together and playing a mean game of Quake. Enough proccessing power to run the super bot that is smart enough not to get detected. Im dreaming.
Seems almost insignificant (Score:1)

by BlightX ( 120254 ) writes:

Alright, there are 6 billion people on the planet. Many of these people work for companies or governments that have webpages that are probably hundreds of indexable pages. Some places auto-update things, producing hundreds of indexable pages. There are also millions of (pointless) personal sites, and some people manage more than one site. I'm shocked that we're only at one billion now.

-BlightX
Apache is the largest (Score:1)

by Menthos ( 25332 ) writes:

The details are pretty interesting; for example, Apache dominates the server market.
Anybody following Netcraft's [netcraft.com] Web Server Survey [netcraft.com] already knew this. But it's still nice to get it confirmed from additional sources.
Re:the best part (Score:1)

by QuMa ( 19440 ) writes:

First, let's pick that nit: It was probably a hostname.

Secondly: It isn't anything at the moment, it won't resolve. I can't even resolve audit.taxes.com.
Indexable webpages (Score:1)

by Anonymous Coward writes:

The web has infinite amounts of indexable webpages, just look at dynamic webpages and CGI driven webpages. If you want proof go search for [A-Z][A-Z][A-Z][A-Z][A-Z][A-Z][A-Z][A-Z] at www.altavista.com. That's an example of 208827064576 indexable webpages (Each one different). Wee.
Inktomi, publicity, and mod_perl (Score:3)

by billh ( 85947 ) writes: on Tuesday January 18, 2000 @07:52AM (#1361325)

Well, as any of us geeks know, this isn't really news. I'm sure we passed the billion mark a long, long time ago. Inktomi just wants the publicity, and some news service will probably pick this up, most likely CNN.

One thing of interest, though. If you look under the "Web server market share", Red Hat and mod_perl are apparently web servers now.

Share
twitter facebook
Re:Longest Domain Name (Score:1)

by networkz ( 27842 ) writes:

The URL doesnt exist.

Taxes.com isn't in use (except by a domain hoarding company).

The IRS also has nothing to do with it either.

Hmm...
I believe it... (Score:1)

by Rantage ( 96467 ) writes:

...and at least 500 million of those pages are at microsoft.com. I know, from personal experience (just the other day it took me 20 minutes to find PWS for Win95). Ever tried to find something buried in there?
Online gaming for motivated, sportsmanlike players: www.steelmaelstrom.org [www.steelm...gtargettop].
Re:Meaningless Statistics (Score:1)

by luckykaa ( 134517 ) writes:

I think what it actually was was that a search engine had one billion web pages indexed. Not that I disagree with you. This is totaly meaningless. I have a couple of web pages that no-one has linked to. I don't think they checked the database for broken links.

How long does it take to count 1 000 000 000 links anyway?
Apache Dominates... (Score:1)

by Midnight Ryder ( 116189 ) writes:

Just looking at the top three:

Apache 60.33%
Microsoft-IIS 25.26%
Netscape-Enterprise 3.79%
Wow - Apache still kicks everyone else's butts, and not by a small margin! I think Apache is about the perfect case for OSS development - not just being a blip on the radar getting larger, but, covering almost the entire radar screen!

I'd love to see more stats out of Inktomi on this, but, it's still cool to see what little the did provide (261,472 links to MP3.com should say something about the digital music scene )
MS uses Inktomi uses Sun (Score:1)

by djKing ( 1970 ) writes:

see: MS Press Release [microsoft.com] and Inktomi Server Cluster [inktomi.com]
Stuff like that make me smile ;)
-Peace
Dave
Did they bump the count to extraghost.com? (Score:2)

by foolishj ( 4695 ) writes:

So were there three links to www.extraghost.com [extraghost.com] before they wrote the page, or after? And which one of the band members works at Inktomi? And will it be four after I post this comment?
1,000,000,000+ and what do we have? (Score:1)

by Maul ( 83993 ) writes:

I'm sure we've all noticed that there are many great pages out there, pages like this and others that provide good hubs of communication, and also several good personally run sites that have good information, or are at least entertaining.
Also note that while these pages exist, there is also a lot of random crap out there that really just wastes space and time. As the number of pages increases, I'm sure that it will be harder and harder to find quality documents among the wasteland of stuff we don't need.

"You ever have that feeling where you're not sure if you're dreaming or awake?"
suspicious (Score:1)

by annarchy ( 31562 ) writes:

These results seem a little strange to me, there is explaination or context for the results.

why did they list the number of links to rickymartin.com or cooking.com

why did they list the longest url as a nonworking URL that probably used to spam the search engines?

oh great, uh hey guys, today I have determined there are 1 billion webpages!
Thaaat's great... (Score:4)

by Greyfox ( 87712 ) writes: on Tuesday January 18, 2000 @07:59AM (#1361334) Homepage Journal

Now INDEX it.
Finding information on the web is going to increasingly be like trying to find hay in a needle stack. Already the current indexing engines can't keep up, and you have unscrupulous web authors putting bunches of keywords unrelated to their site in their meta tags to insure that they get mentioned in every single search. Some indexing engines already ignore meta tags for that reason. And how many times have you tried Altavista, Excite or Google only to find that the page you're trying to get to has expired or is 8 years old and hasn't been changed in 7?
This issue is going to have to be addressed, because the web is going to continue growing.

Share
twitter facebook
Re:Seems almost insignificant (Score:1)

by The Salamander ( 56587 ) writes:

> There are also millions of (pointless) personal sites

Personal sites are what the web is for! All
this commercial and ecommerce stuff is just silly.
So... has anyone figured out how many monkeys? (Score:1)

by szyzyg ( 7313 ) writes:

An almost infinite number of monkeys bagning away on a similar number of typewrites will eventually reqproduce the works of shakespeare.

The internet disproves this hypothesis.

But seriously - has anyone figured out how long it would take to requroduce certain random documents? - such as the works of shakespeare?
Re:Inktomi, publicity, and mod_perl (Score:2)

by annarchy ( 31562 ) writes:

some news service did pick this up...slashdot.
unique? (Score:4)

by Signal 11 ( 7608 ) writes: on Tuesday January 18, 2000 @08:02AM (#1361338)

Well, yahoo has hundreds, nay thousands, nay hundreds of thousands of "uniquely indexable pages" in their database. It's a web of links. How does one define unique?
Really, this article says nothing. Unless it states (and it does not) *exactly* how they mean "unique" I'm not going to take this seriously. A more interesting statistic (and one I haven't seen updated in awhile) would be what the information conversion ratio is between the "RealWorld" and the web - ie: how much information that you can find in a library can you also find online in it's entirety. That is a more accurate measure of growth than raw page numbers.

Share
twitter facebook
1 Billion useless pages. (Score:4)

by Dast ( 10275 ) writes: on Tuesday January 18, 2000 @08:03AM (#1361339)

49.5% Broken links to mp3s
49.5% pr0n pages with javascript popups
1% other

We humans should be so proud of ourselves.

:)

Share
twitter facebook
Re:the best part (Score:1)

by BorgDrone ( 64343 ) writes:

then what about
http://Llanf airpwllgwyngyllgogerychwyrndrobwllllantysiliogogog och.co.uk/ [llanfairpw...goch.co.uk]
---
Re:Meaningless Statistics (Score:1)

by re-geeked ( 113937 ) writes:

From their "more info" link, they counted more than 700,000 "unreachable" sites, vs. more than 4 million "reachable" sites.
Re:And at least one of them already comments on th (Score:1)

by DjMau ( 77879 ) writes:

How can 90% of Internet content be crud if over 50% of it is p0rn ;)
Always remember... (Score:1)

by Mezz ( 120952 ) writes:

...there are three types of lies:lies, damn lies, and statistics. Take from that what you will. BTW, 90% of everything is CRAP (or crud...or even s#|%)
Always remember... (Score:1)

by Mezz ( 120952 ) writes:

...there are three types of lies:lies, damn lies, and statistics. Take from that what you will. BTW, 90% of everything is CRAP (or crud...or even s#|% depending on your frame of mind at any given time;^D)
A public or private search engine? (Score:2)

by dattaway ( 3088 ) writes:

They say they are the world's largest search engine and I get many hits spanning my pages from *.inktomisearch.com, but how do you search their site?

Is inktomi publicly searchable? If it is not, then my pages wouldn't be publicly searchable. So, what's the point of them making connections to my sites?

Is the following how you ban a site from your server?

/etc/httpd/conf/access.conf
#deny from domain
Or from the late Carl Sagan (Score:1)

by Dast ( 10275 ) writes:

<sagan>Billions and billions of pages lost in the cosmic consciousness...</sagan>
*Sarcasm* Gee Wiz (Score:1)

by Slak ( 40625 ) writes:

According to Netcraft (http://www.netcraft.com/survey/), "In the December 1999 survey we received responses from 9,560,866 sites". If each site has 1000 pages (not terribly unreasonable) we're at 9.5 billion, nearly 10 times more than this PR-plug. And this is only counting static pages; my guess is that auctions on eBay do not count. I wonder if they count Deja - how many pages do you think they have in all those news groups?

The Internet is large. Leave it at that.

Cheers,
Slak
Creepy - (Score:1)

by NoizAngel ( 137753 ) writes:

Almost 4000 links point to rickymartin.com -
I'm just curious if that was supposed to be impressive or disturbing. Of course, a good lot of those one billion pages are made by teenagers so-

-Noiz,
Who thinks Ricky Martin looks too much like a clone to be a "hottie".

---------
Inktomi false advertising? (Score:1)

by re-geeked ( 113937 ) writes:

Did I fall asleep for 20 years, or are Inktomi's claims about its search software a little inflated? They stop just short of claiming to read my mind and provide the doc I want as soon as I open the browser.

Someone please tell me if I'm missing some great coolness here. After all, I haven't used anything other than Google for months.
Re:Indexable webpages (Score:1)

by Coriolis ( 110923 ) writes:

Shades of David Langford's "Net of Babel" (after Borges). Or see here [tlon.com] for a real demonstration that the 'net contains an infinite amount of data (although it'd be stretching to call it "information").
Re:the best part (Score:2)

by Nodatadj ( 28279 ) writes:

I had
in.2032.the.world.as.we.know.it.will.self-destru ct.com, whenever I was running "illegal"* servers off my university network.

*There was nothing illegal about them, execpt that the university banned servers.
only a billion? (Score:1)

by Bad_CRC ( 137146 ) writes:

seems that I've been on websites that appeared to have more pages than that.
1 Billion no phone (Score:1)

by djKing ( 1970 ) writes:

There are still a billion folks in the world who haven't even made a phone call.

-Peace
Dave
Re:And at least one of them already comments on th (Score:1)

by DaveHowe ( 51510 ) writes:

How can 90% of Internet content be crud if over 50% of it is p0rn ;)
because 90% of online p0rn is crap too :+)
erm
or so I am told :+)
--
Re:technically inacurate statistics (Score:1)

by inkognito ( 113519 ) writes:

Yeah, except that Inktomi doesn't index purely dynamic pages. . . . nor give them much relevency.
Did anyone else notice the language bias? (Score:1)

by gordie ( 139287 ) writes:

Are they trying to claim that all pages are in English, French or Dutch? What does this indicate as to the rest of their research? I would have thought that the number of pages in Russian (Cyrillic) or one of the eastern languages such as Korean or Japanese, would have been statistically significant enough for inclusion. Makes me wonder about the validity of any of their numbers.
Re:Nice :) (Score:1)

by 0xdeaddeaf ( 61392 ) writes:

Maybe it will help as a Linux marketing tool, but Apache runs on so many different flavors of Unix that I doubt it can be contributed strictly as a win for Linux. The big sites are using Solaris and a lot use *BSD. There are some big sites using Linux (according to http://www.netcraft.com), such as eToys, DejaNews, and some little-known site called SlashDot. It is defintely a win for Open Source, and has been so for a long time.
RedHat? (Score:1)

by RenQuanta ( 3274 ) writes:

From their details [inktomi.com] page:

Apache 60.33%
Microsoft-IIS 25.26%
Netscape-Enterprise 3.79%
Rapidsite 2.07%
Lotus-Domino/Release 1.47%
thttpd 1.37%
WebSitePro 1.21%
WebSTAR 0.93%
Zeus 0.76%
Stronghold 0.71%

NCSA 0.47%
CnG 0.34%
BESTWWWD 0.34%
Concentric 0.29%
Roxen Challenger 0.20%
Red Hat 0.17%
mod_perl 0.16%
tigershark/0.9.8-IC 0.13%

Since when is RedHat a webserver and not a distribution? I'd like to know the method these guys used to get these stats, and why they listed Redhat as a server.
Recount (Score:2)

by Jupiter2 ( 83436 ) writes:

Inktomi and NEC Researcher: "Oh no!!! I can't remember if I counted our own web page. ARRRGGGHH!!! 1, 2, 3, 4, 5, ................."
Inktomi vs. Google (Score:1)

by sylvester ( 98418 ) writes:

From the press release:
"By examining the entire Web and analyzing the billions of links between all of its documents, Inktomi can distill an index of the highest quality documents to provide users with
more relevant and intuitive results."

Isn't that the "technology" that google has patented?
Re:suspicious (Score:1)

by chrischow ( 133164 ) writes:

> why did they list the number of links to rickymartin.com or cooking.com
its called an example
Re:Meaningless Statistics (Score:1)

by Postmaster General ( 136755 ) writes:

They're a tool to be used for many purposes. Luckily, it appears that in this case they're being used to represent facts (nothing is always as it appears to be though.) However, one has to wonder just how accurate these numbers are. You'd need an independant entity to do some type of verification, but then who'd verify those results? The verifier of the verifier, most likely.

I guess if you keep verifying each set of results, we will eventually reach what could be collectively known as an "accurate" number. But who wants to spend all that time, when we can just take these numbers and assume that they're good? I admit, I certainly don't, and I am happily willing to say, "Hey Inktomi, and NEC Research Institute, thanks for the thorough study and it's subsequent report! I can now sleep better at night knowing that my one web page on the internet is confirmed to be not alone! Way to go!"

But wait! How am I supposed to know that my one teeny website was included in their numbers?!? Hmmm, guess I'll have to run my own study just to verify, but then someone else will have to verify my report ... bah, screw it. I give up.
UK or US? (Score:2)

by Nodatadj ( 28279 ) writes:

IE
1,000,000,000 (US)
or
1,000,000,000,000 (UK)

There's a large difference.
Re:A public or private search engine? (Score:1)

by davew ( 820 ) writes:

Inktomi, last I looked, don't run a search engine site; they develop the tech and license it to others who get involved in the messy business of making a popular search engine site.

IIRC, their highest profile customers are Hotbot [hotbot.com], who used Inktomi from the start, and Yahoo! [yahoo.com], who switched from Alta Vista to Inktomi. Inktomi is a more neutral backend for Yahoo!, who are competing in the same market as Alta Vista.

Dave

--
Re:1,000,000,000+ and what do we have? (Score:1)

by British ( 51765 ) writes:

A lot of those web pages unfortunately are those cheap-o templates made by Angelfire users(the shopping list,etc) and have absolutely no original content whatsoever, and are never updated. I say we get rid of 'em
Re:Why? (Score:1)

by Bad_CRC ( 137146 ) writes:

http://www.newgrounds.com/assassin/index.html

choose hamsterdance from the list.

unless you have a small child with you. (it's the only cure)
Re:A public or private search engine? (Score:1)

by Gord ( 23773 ) writes:

You probably are searching italready, check out this page [inktomi.com] to see what sites are powered by Inktomi. --
Use Google (Score:4)

by JoeBuck ( 7947 ) writes: on Tuesday January 18, 2000 @08:42AM (#1361375) Homepage
Google is one of the best search engines available for most purposes, because it ignores meta tags, and scores pages higher based on links to the site from other high-scoring pages (this is a recursive definition but the recursion bottoms out).
The result of this is that it gives useful results even when very common words are used. Try searching for Linux on Google. The first ten results are
- linux.org
- linux.com
- www.debian.org
- www.linuxworld.com
- linux.davecentral.com
- www.varesearch.com (VA Linux)
- linux.corel.com
- www.li.org (Linux International)
- lwn.net (Linux Weekly News)
- www.linuxhq.com
While a human being might be able to come up with a better list, a machine came up with that list, based solely on the structure of the web. (I wonder why linux.davecentral.com rates so high -- possibly because it's attached to a high-ranking site, davecentral.com).
ObAdvocacy: and Google runs on Linux.
Share
twitter facebook
Re:A public or private search engine? (Score:1)

by Steven Pulito ( 80879 ) writes:

I beleive the Inktomi software and database of links is the guts behind a couple of the search engines out there. If my memory serves me correctly their technology powers www.hotbot.com [hotbot.com]
Re:Nice :) (Score:1)

by jams757 ( 112905 ) writes:

Who pissed in your cheerios?
Re:A public or private search engine? (Score:1)

by Hall ( 962 ) writes:

How to keep Inktomi from indexing your site [inktomi.com]
First, why do you not want them to index your site ?
Second, if you've read the other replies to your question, you might want to re-consider...
Finally, I believe the all search engines will ignore you if you do the steps they give. That is, if they follow the rules.
Hmmmmmm (Score:1)

by haus ( 129916 ) writes:

And I thought that cable television was a vast waste land...

all persons, living and dead, are purely coincidental. - Kurt Vonnegut
Hmmm...4 Billion pages... (Score:1)

by shaldannon ( 752 ) writes:

And of those 4 billion, probably 1 billion are on AOL and another billion are on yageohooties (Yahoo+Geocities)/angelfire/dragonfire/.../. This means that at a reasonable guess, a minimum of half of the pages on the net consist of a purple background (or image) with lime green text, broken html, and a couple dozen animated gifs reminiscent of a carnival (and no content beyond "Hi, my name is _____ I was born in _____, my drivers license, SSN, and major credit card are _____, _____, and _____.").

Geez....I say that there are far too many people on the net who just don't belong, and freedom of speech or no, some people shouldn't be allowed to make web sites.

Who am I?
Why am here?
Where is the chocolate?
Re:A public or private search engine? (Score:2)

by JoeBuck ( 7947 ) writes:

Inktomi sells their technology to other companies; they don't operate a search engine under their own name. HotBot [hotbot.com] is Inktomi-based; there are others as well but I don't know who.
Re:1 Billion useless pages. (Score:1)

by pnevares ( 96029 ) writes:

49.5% pr0n pages with javascript popups

And to expand on that statement....each of those popups is another page adding to the "1 billion".

So the ratio is like 1 pr0n page to 15 popups! =)

Pablo Nevares, "the freshmaker".
One billion documents in the Inktomi index (Score:1)

by Anonymous Coward writes:

Inktomi are an american company - one billion is a thousand million. That's the number of docs they have in their index. Inktomi was around a long long time before google, and their technology is a rather cool cluster based one. It currently runs on Solaris for their search. Part of the "Battle" in the search market is on the size of the index that people store. Inktomi are currently trying to leapfrog their competitors (Altavista et al.), which they have done nicely. Most people have at some time or another used Inktomi's seach indirectly through hotwired.com yahoo.com or one of the many other portals Inktomi power. As to "Other languages" - Inktomi are a multinational corporation providing services in japan (goo.com) and a lot of European and South American countries.
Re:Use Google (Score:1)

by Zule_Boy ( 45951 ) writes:

On the topic of google, try doing a search for "More evil than satan" [google.com]. Some of the top ten hits should make you laugh.

--Evan
Re:Hear Hear!! (Score:1)

by jams757 ( 112905 ) writes:

Sorry, I wouldn't know anything about poking another man's anus.
Re:So... has anyone figured out how many monkeys? (Score:1)

by cobyrne ( 118270 ) writes:

An almost infinite number of monkeys bagning away on a similar number of typewrites will eventually reqproduce the works of shakespeare.

An almost infinite number of monkeys banging away on a similar number of typewriters will create...

... one hell of a mess!
Re:A public or private search engine? (Score:2)

by dattaway ( 3088 ) writes:

Thanks for the good page with all the answers. It wasn't immediately obvious how to search their web site. You see, I get hundreds of entries for inktomi.com and others over my 56K dialup. Naturally, I'm curious to see what they are and do a www.them and didn't see anything useful until now.
Re:Apache is the largest (Score:2)

by gorilla ( 36491 ) writes:

I think this is a different measure.
Netcraft's measure is by number of servers, while this measure is by number of pages.
It's not suprising that they both agree, but it's certainly possible that larger sites might have a different server to the average site, causing a difference.
Impressive Marketing statistics (Score:4)

by henley ( 29988 ) writes: on Tuesday January 18, 2000 @09:03AM (#1361394) Homepage

Well, my take from the site that what they're actually saying is "Look at our lovely indexing cluster. It can index 1 billion web thingies! Shouldn't you be buying an search engine product that powerfull?
Or, in other words, it's another example of meaningless statistics spewed in the name of marketing, vaguely covered-up as serious research.
References: Car MPG & top speed figures vs actual usage, Processor MHz as function of system throughput, quoted battery life as function of laptop utilisation, quaketest FPS compared to average internet multiplayer experience etc etc etc...

Share
twitter facebook
Infinity (Score:2)

by David A. Madore ( 30444 ) writes:

Hair splitting alert ON.
The number of (different) pages on the web is actually infinite. Here [eleves.ens.fr] is a sample infinite component.
(Actually it's finite because the maximal accepted length for a URL is finite. But it's way above the billions.)
Note that these are not dynamical pages. Dynamical pages (i.e. pages whose content changes for the same URL) don't count: they're cheating.
(The source used to generate this infinite number of pages is available under the GPL [quatramaran.ens.fr].)
In related news... (Score:2)

by jd ( 1658 ) writes:

The one billion documents were found to be a plot by The Cult of Arthur C Clarke to end the Universe - each page having a unique name of God on it.
Re:1 Billion useless pages. (Score:1)

by MS ( 18681 ) writes:

That's still 10.000.000 "other" pages left.
At a speed of one page per minute, it will take the rest of my life to read them all (about 57 years, considering that I can't read more than 8 hours a day: I'll also have to eat and sleep, ...).
:-)
ms
That's search engine trickery... (Score:1)

by bero-rh ( 98815 ) writes:

Many search engines score pages higher that have the keywords in the hostname, so creating tons of subdomains to get every possible keyword into the hostname might actually get the page in top positions for several keywords.

Guess it's time someone anti-microsoft gets microsoft.ms.windows.windows2000.windowsnt.office. office2000.2000.windows95.windows98.mswi ndows.mswindows2000.sucks.org. ;)
Re:the best part (Score:1)

by Otto ( 17870 ) writes:

No, but if you check out http://taxes.com [taxes.com] you see it is a domain name for sale by greatdomains.com or something like that...

it's phony.
---
Re:Seems almost insignificant (Score:1)

by Jon_H ( 74112 ) writes:

There are also millions of (pointless) personal sites

Why are personal sites pointless ? Just because most of them aren't things you wan't to read doesn't make them, IMHO, useless.

In fact it's the empowerment that enables Ordinary Joe to publish his personal page that makes the web what it is and not just a virtual shopping center.

I'd just like all six billion people to be able to participate.
Good for the WWW, but where's my damn page??? (Score:1)

by Gray_Wolf ( 137611 ) writes:

Congratulations to the WWW and this accomplishment (Those of you who run or are members of porno sites: You don't deserve it). However, with the increasing number of homepages on the internet, who in the heck is gonna keep track of them all?? I am fully aware of the different directories and search engines, but they have such stringet rules for Link Submission, it discourages many newbies from starting a webpage. I am also fully aware of the necessity of the META tags required, but then there's also many other criteria that I don't think anyone is really aware of. I have repetedly e-mailed Yahoo! and Excite for or if they have a criteria list for homepage submission, only to wind up with a reply from an automated service, then never hear from them again!! Luckily, my news of my webpage gets around Via word of mouth, not on some search engine, but I'm going to change that.

But keeping track of all these billions of pages, will be difficult, and sooner or later, people are going to demand satisfaction! (Slap me with that glove again, and I'll give you satisfaction, in a .223 caliber!!)

The Gray Wolf
Re:Meaningless Statistics (Score:1)

by fence ( 70444 ) writes:

Let's ask Mr. Owl-

Mr Owl, How long does it take to count 1 000 000 000 links anyway?

Mr Owl: "ah one, ah two, ah three *CRUNCH*-- ah three"

There you go folks, it takes three to count 1 000 000 000 links. Thank you, Mr. Owl!
Re:Meaningless Statistics (Score:2)

by dsplat ( 73054 ) writes:

That's what I hate about such "statistics". No information or context is given. One is not told how this estimate of "one billion" is gotten.

Remember, 53.4% of all statistics are invented on the spot. Of those, 63.1% are never checked against any reliable source. The rest are attributed to a survey done by Expensive Management Consultants [devnull]. You can buy a copy of the report from them for only $2499, which includes the introductory price of a year's subscription to their weekly newletter containing the abstracts of other reports you can purchase, at a substantial 10% discount off the regular price that no one ever pays them anyway.
Re:Inktomi vs. Google (Score:1)

by markpapadakis ( 115698 ) writes:

Actually, yes... Seems like Inktomi is borrowing ideas and technology from other rivals ( Google and DirectHit ) although Inktomi results suck big time. All that technology stuff they write in their page, is just that; crap.. Try searching on Yahoo! for something and then switch to Web Pages view to see. Google rocks. Inktomi sux.

Mark Papadakis, WebDeveloper
Re:A public or private search engine? (Score:2)

by jfunk ( 33224 ) writes:

I just checked out hotbot to see if any of my sites (which are constantly hammered by the Inktomi crawler, as I'm sure is the case with most sites) would come up.

No hits.

Google finds them, though.

Something's definitely amiss regarding Inktomi.
Re:And at least one of them already comments on th (Score:1)

by unDees ( 116113 ) writes:

How can 90% of Internet content be crud if over 50% of it is p0rn ;)
because 90% of online p0rn is crap too :+)
Do you mean fecofilia, or just low quality? *impertinent smirk*
--unDees
Re:Apache is the largest (Score:1)

by god ( 219136 ) writes:

Note that Inktomi say the "number of reachable Web sites" is 4,217,324, while Netcraft found 9,560,866 last month. Isn't this a bit poor for a company that's trying to index the web? Even Netcraft reckons it isn't finding the whole web...
Re:A public or private search engine? (Score:2)

by JoeBuck ( 7947 ) writes:

Hotbot uses Inktomi technology. They don't use Inktomi's database (I don't know who does).
Re:the best part (Score:1)

by bradipo ( 94457 ) writes:

Unfortunately that domain never resolves... I don't even think that it is valid. The search engine probably just pulled it out of a web page document somewhere. Not a surprise given that it seems to think that RedHat is a http web server (below Roxen). :)
My idea, you can't patent it (Score:2)

by dsplat ( 73054 ) writes:
Okay, it isn't new and it isn't my idea originally, but I'll put a new spin on it. Is there a new for a moderated index to the most useful stuff on the web? Hey andover.net, I'm talking to you too. An index to everything open source related would be great. After all, an index to the whole web is a huge project that never ends and eventually sucks up all your free time. But it may be useful to have moderators rate the links on two factors:
1. General usefulness of the information on the page/site. Good stuff is good, no matter how you got there.
2. Specific applicability of the index to the page. Getting to the wrong good stuff or seeing too many links for a particular idea doesn't help.
I'm willing to help moderate on some subjects.
just a thought (Score:1)

by hal9000 ( 80652 ) writes:

hmm...
so that means that if each and every page on the WWW were worth $100, then it would equal bill gates' pocket.
that's nuts
Re:Apache is the largest (Score:2)

by gorilla ( 36491 ) writes:

Grabbing just one page from each server is going to be faster than spidering the entire site. Therefore I'd expect netcraft to be ahead of all the search engines.
Re:UK or US? (Score:1)

by Anonymous Coward writes:

Try using milliard.
Large but Finite number of monkeys (Score:2)

by Greyfox ( 87712 ) writes:

I believe the original thought experiment calls for an infinite number of monkeys. It does not say anything about the infinite volume of monkey shit that would be produced over the course of the experiment.
The Internet does not represent an infinite number of users (at least, not yet) but you're still more likely to get an infinite volume of monkey shit out of it while you try to dig up the works of Shakespere.
Or you could save time and go here. [mit.edu]
Re:Why? (Score:2)

by dsplat ( 73054 ) writes:

The link you provided doesn't respond well. I think they've been slashdotted [tuxedo.org]. So I did a search at Google [google.com] for Hamsterdeath and found this [cotse.com]. Enjoy!
Re:Web Antiques (Score:2)

by otis wildflower ( 4889 ) writes:

Check out Ghost sites [disobey.com]...

Your Working Boy,
Re:the best part (Score:2)

by Nodatadj ( 28279 ) writes:

The one I really want
is
i.should.co.co

but I dunno how to register a hostname in columbia (or whereever CO is)
Re:A public or private search engine? (Score:2)

by jfunk ( 33224 ) writes:

Hotbot uses Inktomi technology. They don't use Inktomi's database

Ahh, I see now. They are crawling my sites but not letting anybody search the results unless they pay big bucks.

Hmmm, looks like I'll be making a modification to my robots.txt files and possibly adding some new rules to my firewall.

I should be allowed to find out what info about my sites they are trying to sell. If I can't, they won't be getting access.
One billion channels and nothing on ... (Score:2)

by fable2112 ( 46114 ) writes:

Just another for-all-practical-purposes-meaningless statistic to nonetheless feel overwhelmed by, I suppose.

If there were a billion pages to look at, I don't know when I'd have the time to do anything else, being the info-junkie that I am. Fortunately, a sufficient quantity of these pages do not interest me. :)

Then, too, I wonder how many of these pages are de facto duplicates? ("Department of redundancy department, redundant division speaking ...") For instance, I'm right in the middle of moving my pages off of geocities and onto drak.net. At the moment, the pages that I've put up on drak.net that were part of my old geocities page still exist on geocities because I'm not done moving everything yet, and can't shut down my old page until EVERYthing is transported. I went through a similar process when I moved TO geocities from my college web page two and a half years ago.

That also makes me wonder more about this statistic. Are there one billion ACTIVE pages, or merely one billion pages that have ever existed? If the former, how many pages have ever existed? That would be an interesting question ....

Well, by making this post I'm probably creating yet another page and adding to the noise and confusion. Consider it my chaotic deed for the day. :)

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

mcdonald's effect (Score:1)

Nice :) (Score:1)

In related news... (Score:2)

the best part (Score:4)

Billions Served (Score:1)

And at least one of them already comments on that (Score:4)

Meaningless Statistics (Score:3)

Heh... (Score:2)

Longest Domain Name (Score:1)

Why? (Score:3)

Re:the best part (Score:1)

technically inacurate statistics (Score:3)

All I want... (Score:1)

Seems almost insignificant (Score:1)

Apache is the largest (Score:1)

Re:the best part (Score:1)

Indexable webpages (Score:1)

Inktomi, publicity, and mod_perl (Score:3)

Re:Longest Domain Name (Score:1)

I believe it... (Score:1)

Re:Meaningless Statistics (Score:1)

Apache Dominates... (Score:1)

MS uses Inktomi uses Sun (Score:1)

Did they bump the count to extraghost.com? (Score:2)

1,000,000,000+ and what do we have? (Score:1)

suspicious (Score:1)

Thaaat's great... (Score:4)

Re:Seems almost insignificant (Score:1)

So... has anyone figured out how many monkeys? (Score:1)

Re:Inktomi, publicity, and mod_perl (Score:2)

unique? (Score:4)

1 Billion useless pages. (Score:4)

Re:the best part (Score:1)

Re:Meaningless Statistics (Score:1)

Re:And at least one of them already comments on th (Score:1)

Always remember... (Score:1)

Always remember... (Score:1)

A public or private search engine? (Score:2)

Or from the late Carl Sagan (Score:1)

*Sarcasm* Gee Wiz (Score:1)

Creepy - (Score:1)

Inktomi false advertising? (Score:1)

Re:Indexable webpages (Score:1)

Re:the best part (Score:2)

only a billion? (Score:1)

1 Billion no phone (Score:1)

Re:And at least one of them already comments on th (Score:1)

Re:technically inacurate statistics (Score:1)

Did anyone else notice the language bias? (Score:1)

Re:Nice :) (Score:1)

RedHat? (Score:1)

Recount (Score:2)

Inktomi vs. Google (Score:1)

Re:suspicious (Score:1)

Re:Meaningless Statistics (Score:1)

UK or US? (Score:2)

Re:A public or private search engine? (Score:1)

Re:1,000,000,000+ and what do we have? (Score:1)

Re:Why? (Score:1)

Re:A public or private search engine? (Score:1)

Use Google (Score:4)

Re:A public or private search engine? (Score:1)

Re:Nice :) (Score:1)

Re:A public or private search engine? (Score:1)

Hmmmmmm (Score:1)

Hmmm...4 Billion pages... (Score:1)

Re:A public or private search engine? (Score:2)

Re:1 Billion useless pages. (Score:1)

One billion documents in the Inktomi index (Score:1)

Re:Use Google (Score:1)

Re:Hear Hear!! (Score:1)

Re:So... has anyone figured out how many monkeys? (Score:1)

Re:A public or private search engine? (Score:2)

Re:Apache is the largest (Score:2)

Impressive Marketing statistics (Score:4)

Infinity (Score:2)

In related news... (Score:2)

Re:1 Billion useless pages. (Score:1)

That's search engine trickery... (Score:1)

Re:the best part (Score:1)

Sarcasm Gee Wiz (Score:1)