Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Huge Traffic On Wikipedia's Non-Profit Budget

timothy posted more than 6 years ago | from the optimizing-smartitude dept.

Networking 240

miller60 writes "'As a non-profit running one of the world's busiest web destinations, Wikipedia provides an unusual case study of a high-performance site. In an era when Google and Microsoft can spend $500 million on one of their global data center projects, Wikipedia's infrastructure runs on fewer than 300 servers housed in a single data center in Tampa, Fla.' Domas Mituzas of MySQL/Sun gave a presentation Monday at the Velocity conference that provided an inside look at the technology behind Wikipedia, which he calls an 'operations underdog.'"

cancel ×

240 comments

Impressive (4, Insightful)

locokamil (850008) | more than 6 years ago | (#23920399)

Given that their topic sites are generally in the top three for any search engine query, the volume of traffic they're dealing with (and the budget that they have!) is very impressive. I always thought that they had much beefier infrastructure than the article says.

Re:Impressive (4, Funny)

VeNoM0619 (1058216) | more than 6 years ago | (#23920457)

Yes, and seeing how slashdot decided to try and slashdot them also helps...

Wikipedia = much more traffic than slashdot (5, Interesting)

Anonymous Coward | more than 6 years ago | (#23921887)

Slashdot does .. what? 40 mbit of traffic at peak? Wikipedia
is roughly 100 times larger [nedworks.org] . (And WP has three datacenters, not one)

Slashdot traffic hasn't created noticeable blips on Wikipedia's radar for years.

OTOH, if Wikipedia linked slashdot on every page slashdot would go down, if do to nothing else but bandwidth exhaustion.

Re:Impressive (3, Interesting)

sm62704 (957197) | more than 6 years ago | (#23920571)

I was always impressed with how fast pages loaded, after seeing how small their operation is I'm even more impressed now!

Go to any newspaper from the NYT to any one in a smaller city (say, Springfield's State Journal-Register) and the difference in load times is HUGE. Probably has to do with all the ads served from third party servers in the newspapers, what's the use of having a humungous server with giant pipes if your readers' pages have to wait for a flash ad served from a 486 powered by gerbils?

If I link to the SJR form one of my journals it slows down! I mean, I can see if it's a front page slashdotting a little paper like that but come on, a user journal?

And Wikipedia isn't all their servers serve; iinm the uncyclopedia shares servers. Impressive, indeed.

Re:Impressive (5, Informative)

David Gerard (12369) | more than 6 years ago | (#23922491)

No, actually - the Wikimedia servers serve all Wikimedia projects (all the Wikipedias, Wikimedia Commons, all the other projects), but Uncyclopedia is part of Wikia, which is a private company owned by Jimmy Wales to do wikis and isn't actually linked to the Wikimedia Foundation in any way.

Re:Impressive (3, Interesting)

Bandman (86149) | more than 6 years ago | (#23921085)

Yea, a single datacenter seems really risky, especially considering some of the shenanigans [google.com] that have been going on

Re:Impressive (4, Informative)

Achromatic1978 (916097) | more than 6 years ago | (#23921443)

Except there's not. There's data centers in Europe and Asia, too, including one at some Yahoo facilities - at least on this note, the article (or summary) is utterly wrong. Single datacenter? No.

Re:Impressive (2, Interesting)

Bandman (86149) | more than 6 years ago | (#23921571)

That would make a lot more sense.

Given the sheer amount of people who access it, it seems like the perfect use for GSLB [networkcomputing.com]

Re:Impressive (4, Informative)

David Gerard (12369) | more than 6 years ago | (#23922539)

Single database, though. All the databases for all the projects are in Tampa - one master for English Wikipedia and two for all the other 700+ Wikimedia projects.

(They tried running the databases for Asian languages from the Yahoo!-sponsored datacentre in Seoul for a while, but it didn't actually work much faster than it did with everything in Tampa.)

f1rst p0st ! (-1, Offtopic)

Anonymous Coward | more than 6 years ago | (#23920441)

f1rst p0st b1tch3s !!

Re:f1rst p0st ! (-1, Offtopic)

Anonymous Coward | more than 6 years ago | (#23920867)

I bet that wasn't your first demonstration of FAIL, either.

I've always wondered... (4, Insightful)

mnslinky (1105103) | more than 6 years ago | (#23920471)

It would be neat to have a deeper look at their budget to see how I can save money and boost performance at work. It's always nice having the newest/fastest systems out there, but it's rarely the reality.

Re:I've always wondered... (5, Funny)

Anonymous Coward | more than 6 years ago | (#23921481)

"It would be neat to have a deeper look at their budget to see how I can save money and boost performance at work."

Since they are using LAMP, obviously they could save money by following Microsoft's "Get The Facts" advice!

Re:I've always wondered... (5, Informative)

midom (535130) | more than 6 years ago | (#23922127)

I covered most of Wikipedia technology bits at my previous year MySQL Conference presentation: http://dammit.lt/uc/workbook2007.pdf [dammit.lt] (thats quite detailed report)

It's easy... (1)

CarpetShark (865376) | more than 6 years ago | (#23922133)

If wikipedia is anything to go by, you just don't include a decent search engine.

The power of low standards (4, Insightful)

Itninja (937614) | more than 6 years ago | (#23920477)

From TFA: "But losing a few seconds of changes doesn't destroy our business."

Our organizations' databases (also a non-profit) get several thousand writes per second. Losing 'a few seconds' would mean potentially hundreds of users' record changes were lost. If that happened here, it would be a huge deal. If it happened regularly, it would destroy the business.

Re:The power of low standards (5, Insightful)

robbkidd (154298) | more than 6 years ago | (#23920799)

Okay. So pay attention to the sentence before the one you quoted which read, "I'm not suggesting you should follow how we do it."

Re:The power of low standards (5, Insightful)

Anonymous Coward | more than 6 years ago | (#23921057)

Don't be too harsh -- the standards are dependent on the application. Your application, by the nature of the information and its purposes, requires a different standard of reliability than Wikipedia does. You're certainly entitled to be proud of yourself for maintaining that standard.

But don't let that turn into being derogatory about the Wikipedia operation. Wikipedia has identified the correct standard for their application, and by doing so they have successfully avoided the costs and hassle of over-engineering. To each his own...

Re:The power of low standards (4, Interesting)

WaltBusterkeys (1156557) | more than 6 years ago | (#23921379)

Exactly. A bank requires "six nines" of performance (i.e., right 99.9999% of the time) and probably wants even better than that. Six nines works out to about 30 seconds of downtime per year.

It seems like Wikipedia is getting things right 99% of the time, or maybe even 99.9% of the time ("three nines"). That's a pretty low standard relative to how most companies do business.

Re:The power of low standards (5, Informative)

Nkwe (604125) | more than 6 years ago | (#23921765)

A bank requires "six nines" of performance (i.e., right 99.9999% of the time) and probably wants even better than that.

Banks don't require "six nines"; banks require that no data (data being money), once committed, get lost. The "nines" rating refers to the percentage of time a system is online, working, and available to its users. It does not refer to the percentage of acceptable data loss. It is acceptable for bank systems to have downtime, scheduled maintenance, or "closing periods" -- all of these eat into a "nines" rating, none of which lead to data loss.

Re:The power of low standards (1)

WaltBusterkeys (1156557) | more than 6 years ago | (#23921997)

The nines can refer to both.

I agree that banks can't withstand data loss, but they can withstand data errors. If there's a 30-second period per year when data doesn't properly move, and that requires manual cleanup, that's acceptable.

Re:The power of low standards (2, Insightful)

Waffle Iron (339739) | more than 6 years ago | (#23922185)

Indeed. Some of us are old enough to remember the days of "banker's hours" and before ATMs, when banks used to make their customers deal with less than "one two" (20%) availability.

Re:The power of low standards (2, Insightful)

astrotek (132325) | more than 6 years ago | (#23922409)

Thats amazing considering I get an error page on bank of america around 5% of the time if I move to quickly though the site.

Re:The power of low standards (1)

ericspinder (146776) | more than 6 years ago | (#23921599)

Losing 'a few seconds' would mean potentially hundreds of users' record changes were lost. If that happened here, it would be a huge deal.
If you don't deal with financial data, it's likely that even your business would survive should such an event like that happens. Sure if it happens all the time users would flee, but I haven't seen such problems at Wikipedia. He wasn't talking about doing it regularly, just that when disaster does strike, no pointy haired guy appears to assign blame.

Re:The power of low standards (2, Informative)

MinuteElectron (1179725) | more than 6 years ago | (#23921625)

Changes are never just lost, when an error does happen and the action cannot be completed then it is rejected and the user notified of this so they can try what they were doing again. You have vastly overstated the severity of such issues.

Re:The power of low standards (1)

midom (535130) | more than 6 years ago | (#23922093)

that happens to us once every few years maybe ;-) the fact is that servers don't go down too often. --Domas

I was just thinking that (2, Funny)

imstanny (722685) | more than 6 years ago | (#23920481)

Every time I Google something, Wikipedia comes near the top most of the time. Maybe that's why Google doesn't want to disclose its processing power, it may very will be a lot smaller than people assume.

Re:I was just thinking that (0)

Anonymous Coward | more than 6 years ago | (#23920813)

Yes, but anytime anyone googles anything, google has to do processing. Thus even though wikipedia is at the top for a small subset of searches (generally the ones for information as opposed to trends; commerce; specific blogs; &c.), google has a lot more work to do.

Let alone that google solves a logistic regression problem with (almost) every search, to figure out optimal adwords placements...

Re:I was just thinking that (1)

Albanach (527650) | more than 6 years ago | (#23921761)

Yes, but anytime anyone googles anything, google has to do processing.

I'd have thought they'd use a caching solution just like wikipedia. After all, just as Wikipedia has some very popular pages and some less so, Google has many popular searches and many less so. Wouldn't they cache these? After all if you're dealing with millions of searches for 'george carlin' you wouldn't want to go query your entire DB every time, would you?

Re:I was just thinking that (1)

Spatial (1235392) | more than 6 years ago | (#23920851)

But why would they think it was a bad thing to expose? The whole "Look what we can do with so little" angle seems appealing; efficiency is something to boast about nowadays.

Re:I was just thinking that (2, Interesting)

imstanny (722685) | more than 6 years ago | (#23921153)

But why would they think it was a bad thing to expose? The whole "Look what we can do with so little" angle seems appealing; efficiency is something to boast about nowadays.
On one hand, you're right, efficiency is admirable. But on the other hand, if Google has insane amounts of processing power, it would likely mean much higher barriers to entry for its competitors. The threat of Google's power in processing such data could deter others from even attempting to compete with Google. After all, when Google started it was only funded with a few hundred thousand dollars.

Re:I was just thinking that (1)

Bandman (86149) | more than 6 years ago | (#23921629)

Ever pay attention to the render times, though?

Their infrastructure is scary-massive, from almost every report [datacenterknowledge.com]

Re:I was just thinking that (4, Interesting)

Chris Burke (6130) | more than 6 years ago | (#23921969)

I don't actually know anything about the total computing power Google employs, but I do know that they will purchase on the order of 1,000-10,000 processors merely to evaluate them prior to making a real purchase.

quick everybody (1, Redundant)

daveatneowindotnet (1309197) | more than 6 years ago | (#23920495)

read up on the Roman Republic now before Wikipedia gets Slashdotted

Easy to Increase the budget or add servers (5, Funny)

Subm (79417) | more than 6 years ago | (#23920513)

How hard can it be to increase the budget or add more servers?

Just go to the Wikipedia page with those numbers and change them. You don't even need to have an account.

Re:Easy to Increase the budget or add servers (1)

xkhaozx (978974) | more than 6 years ago | (#23920977)

Yeah, but those stupid admins keep reversing my changes.

Re:Easy to Increase the budget or add servers (0)

owlnation (858981) | more than 6 years ago | (#23921421)

Yeah, but those stupid admins keep reversing my changes.
What did you expect? Truth?

Re:Easy to Increase the budget or add servers (1)

khellendros1984 (792761) | more than 6 years ago | (#23922525)

I'm fine with truthiness, myself.

Re:Easy to Increase the budget or add servers (5, Funny)

elrous0 (869638) | more than 6 years ago | (#23921929)

In their defense, if you're going to run your entire site off a single server farm,a coastal city in Florida is the logical place to put it.

Some thoughts (1)

morgan_greywolf (835522) | more than 6 years ago | (#23920519)

"The traditional approach to availability isn't exactly our way," said Mituzas, who spoke about Wikipedia's infrastructure Monday at the O'Reilly Velocity conference.

More and more companies should look into approaches like this. Seriously. In tight economic times, a more ad-hoc approach saves money. People snubbed Google's approach to IT, and now it's becoming the standard in high availability for big dollar projects. But what about the small dollar approach? As economies slide into recession, you need to focus on a handful highly-talented IT people rather than an army of droids.

Re:Some thoughts (1)

Itninja (937614) | more than 6 years ago | (#23920695)

you need to focus on a handful highly-talented IT people rather than an army of droids
As long as these IT people are willing to work well below the industry pay-scale (often for free), then yeah, that would work great. Notice that most of the Wiki IT staff also have to have 'day jobs' to feed/clothe/house themselves.

Re:Some thoughts (2, Insightful)

bsDaemon (87307) | more than 6 years ago | (#23920749)

Which is somehow different from any other open source project how?

Re:Some thoughts (1)

Itninja (937614) | more than 6 years ago | (#23921367)

It's not. But the parent was implying that corporation should follow the same model. I was just pointing out that for-profit companies need to pay their people a bit more than non-profit love-in projects like Wikipedia.

Re:Some thoughts (4, Insightful)

TheLazySci-FiAuthor (1089561) | more than 6 years ago | (#23920727)

"... you need to focus on a handful highly-talented IT people rather than an army of droids."

This is so true; I've always said, "you get what you pay for."

Do you want to pay for software, or do you want to pay for people?

Only one can create the other.

Re:Some thoughts (5, Funny)

morgan_greywolf (835522) | more than 6 years ago | (#23920983)

Do you want to pay for software, or do you want to pay for people?

Only one can create the other.

Oh, gods, let's hope so!

Re:Some thoughts (1)

madfancier (1111009) | more than 6 years ago | (#23922353)

Do you want to pay for software, or do you want to pay for people?

Only one can create the other.

Not in Soviet Russia.

Interesting but... (1)

wolf12886 (1206182) | more than 6 years ago | (#23920561)

Interesting to know, but I wish the article was more substantial than a list of tangential statistics. Also, although Wikipedia receives a hell of alot of traffic, I bet its at least an order of magnitude smaller than googles.

If someone knows where we can find a good comparison between Wikipedia and others, as far as cost to traffic ratio, please speak up.

Maybe... (3, Funny)

nakajoe (1123579) | more than 6 years ago | (#23920577)

Datacenterknowledge.com might want to take lessons from Wikipedia as well. Slashdotted...

Note to self (5, Funny)

Anita Coney (648748) | more than 6 years ago | (#23920591)

If you ever find yourself in a flamewar on Wikipedia you cannot win, bomb Tampa, Florida out of existence.

Re:Note to self (5, Funny)

canajin56 (660655) | more than 6 years ago | (#23920949)

That's your solution to everything.

Re:Note to self (2, Funny)

TubeSteak (669689) | more than 6 years ago | (#23921677)

That's your solution to everything.
I did ask if you wouldn't prefer a nice game of Chess.
-WOPR

Re:Note to self (4, Interesting)

Ron Bennett (14590) | more than 6 years ago | (#23921073)

Or do a hurricane dance, and let nature do its thing...

Having all their servers in Tampa, FL (of all places given hurricanes, frequent lightning, flooding, etc there) doesn't seem too smart - I would have thought, given Wikipedia's popularity, their servers would be geographically spread out in multiple locations.

Though to do that adds a level of complexity and costs that even many for-profit ventures, such as Slashdot, likely can't afford / justify; Slashdot's servers are in one place - Chicago ... to digress a bit, I notice this site's accessibility (ie. more page not found / timeouts lately) has been spotty since the servers move.

Ron

Re:Note to self (4, Informative)

OverlordQ (264228) | more than 6 years ago | (#23921791)

They're not all in Tampa, they have a bunch in Netherlands and a few more in South Korea.

Re:Note to self (1)

LWATCDR (28044) | more than 6 years ago | (#23922277)

Tampa hasn't been hit by many Hurricanes. They don't have issues with flooding that I know about and lightning is lightning. It can happen anywhere just do your best to protect your systems from it.
If you are a few miles inland in Florida Hurricanes are not that big of an issue. If you have a good backup generator then it isn't that big of a problem.
Oh did I mention I was born, live, and work in Florida. My office was hit by Frances, Jean, and Wilma. Total damage to the office... Nothing. Total Damage to my home? Three shingles.
Florida doesn't tend to suffer from wide spread flooding like places in the midwest and really strong hurricanes like Andrew are actually very rare.
Most hurricanes in Florida would be a none event if our power company kept the power up. We call Florida Power and Light Florida Flicker and Flash.
For a data center a backup power system is really all you need.

They were distributed at one time (0)

Anonymous Coward | more than 6 years ago | (#23922365)

This is not the first article on Wikipedia's infrastructure to grace Slashdot.

I seem to remember some data distribution (DB replicants) in other parts of the world.

I could be wrong!

Re:Note to self (1)

xpuppykickerx (1290760) | more than 6 years ago | (#23921943)

Please don't bomb Tampa. I will be homeless and very mad at you. Not that I will be able to post on Slashdot to express my anger.

More importantly (5, Interesting)

wolf12886 (1206182) | more than 6 years ago | (#23920755)

I don't care how few servers they have, whats more interesting to me is that they run an ultra-high traffic site, which they aren't having trouble paying for, and do it without adds.

Simplicity (5, Interesting)

wsanders (114993) | more than 6 years ago | (#23921373)

Although much of the Mediawiki software is a hideous twitching blob of PHP Hell, the base functionality is fairly simple and run perpetually and scale massively as long as you don't mess with it.

What spoils a lot of projects like this is the constant need for customization. Wikimedia essentially can't be customized (except for plugins obviously, which you install at your own peril) and that is a big reason why it scales so massively.

As for Wikipedia itself, I suspect it is massively weighted in favor of reads. That simplifies circumstances a lot.

Sure they do it without ads... (3, Informative)

DerekLyons (302214) | more than 6 years ago | (#23921751)

Sure they do without ad income. But they also do it without having to pay salaries, or co location fees, or bandwidth costs... (I know they pay some of those, but they also get a metric buttload of contributions in kind.)

When your costs are lower, and your standard of service (and content) malleable, it is easy to live on a smaller income.

Re:Sure they do it without ads... (1)

quanticle (843097) | more than 6 years ago | (#23922339)

But they also do it without having to pay salaries, co location fees, or bandwidth costs...

Well, as far as salaries go, yeah, they don't have to pay for a full team of developers and administrators for the business, but they do need to pay people to go and check on the servers, replace faulty hardware, etc. Also, as far a co-location costs go, I'd say that running your own data center (i.e. providing your own electricity, cooling, backup power supplies, etc.) can't be cheap either.

What is the role of Open Source (1)

bogaboga (793279) | more than 6 years ago | (#23920823)

I wonder how much of a role open source software is playing in Wikipedia's operations. How much is it? Anyone in the know?

Re:What is the role of Open Source (4, Interesting)

KokorHekkus (986906) | more than 6 years ago | (#23921027)

The wiki software, MediaWiki, was written for Wikipedia and is licensed under the GPL ( http://www.mediawiki.org/wiki/How_does_MediaWiki_work%3F [mediawiki.org] . According to Wikipedia they use MySQL as their database and run it all on Linux servers.

Re:What is the role of Open Source (0)

Anonymous Coward | more than 6 years ago | (#23921603)

MediaWiki BTW, is pretty great. I was just handed a task of setting up organization wide Wiki and I found it very easy to setup and customize. It runs fast with memcached and eaccelerator.

Re:What is the role of Open Source (2, Insightful)

guruevi (827432) | more than 6 years ago | (#23922481)

I don't know what else but open source you could use especially on the database side. You have only a few choices:

Microsoft ($$$) (approx. $50,000 per server per year in licensing costs since it's a public (unlimited CAL) enterprise-level site)
IBM ($$) (approx. $500,000 per year for leasing the whole operation, another load for support)
Oracle ($) (approx. $20,000 per backend and about 30 contractors for the next 5 years for the implementation)
Linux, MySQL, PHP (Free)

Not to mention, with Microsoft you'll need more servers to handle the same amount of load especially if you use Microsoft-based software package for the frontend as well (ASP.NET, MS CRM or SharePoint).

For IBM you'll have special hardware that nobody can handle but IBM certified support personnel.

For Oracle you're pretty much on your own anyway and you'll have to find a frontend.

Out like a light (1)

Joebert (946227) | more than 6 years ago | (#23920889)

300 servers housed in a single data center in Tampa, Fla.

Did Wikipedia go down when hurricanes Chralie/etc came through a few years ago ?
I lost power for about a week when that happened and I only live about 15 miles from Tampa, right over the Courtney Campbel Causeway actually.

Re:Out like a light (2, Informative)

timstarling (736114) | more than 6 years ago | (#23922069)

We've never lost external power while we've been at Tampa, but if we did, there are diesel generators. Not that it would be a big deal if we lost power for a day or two. There's no serious problem as long as there's no physical damage to the servers, which we're assured is essentially impossible even with a direct hurricane strike, since the building is well above sea-level and there are no external windows.

Re:Out like a light (1)

Joebert (946227) | more than 6 years ago | (#23922299)

Well then, I guess we all know where I'm going next time a hurricane rolls through. :)

What amazes me... (0)

Anonymous Coward | more than 6 years ago | (#23920917)

What amazes me is that not only they manage all this traffic on such a small infrastructure, but even with them being on the front page of /. the site is still up.

Re:What amazes me... (1)

c0ol (628751) | more than 6 years ago | (#23921231)

Seriously? Slashdot is not even a blip on their traffic...

Re:What amazes me... (1)

Doug Neal (195160) | more than 6 years ago | (#23921787)

Correct [alexa.com]

Re:What amazes me... (4, Interesting)

ceejayoz (567949) | more than 6 years ago | (#23921533)

Slashdot is great at taking down sites on crappy shared hosting, but anything with a decently configured dedicated server will likely survive just fine.

Wikipedia's probably getting hit with hundreds of times the traffic Slashdot is at all times.

Re:What amazes me... (1)

quanticle (843097) | more than 6 years ago | (#23922425)

To be quite honest, I'd say that the Slashdot surge is probably a drop in the bucket as far as Wikipedia is concerned. I mean, they're the top result for loads of Google queries, and plenty of people go straight to Wikipedia when they need to look something up.

Off-topic, I know, but...what about /.'s hardware? (5, Interesting)

kiwimate (458274) | more than 6 years ago | (#23920963)

I.e. the promised follow-up to this story [slashdot.org] about moving to the new Chicago datacenter? You know, the one where Mr. Taco promised a follow-up story "in a few days" about the "ridiculously overpowered new hardware".

I was quite looking forward to that, but it never eventuated, unless I missed it. It's certainly not filed under Topics->Slashdot.

Re:Off-topic, I know, but...what about /.'s hardwa (1)

larry bagina (561269) | more than 6 years ago | (#23921967)

Remember when CmdrTaco called wikipedia a fad and said they couldn't scale? It was during the last (only?) slashdot IRC "interview" a few years back. Just before wikipedia overtook /. in traffic.

Tampa? (0)

QuietLagoon (813062) | more than 6 years ago | (#23920985)

300 servers housed in a single data center in Tampa, Fla.

Does anyone see the lack of planning that resulted in the placement of a major data center in the thunderstorm and lightning-strike capitol of the world?

Re:Tampa? (2, Funny)

nickull (943338) | more than 6 years ago | (#23921041)

Not to mention hurricanes and faulty electronic voting machines.... ;-)

Re:Tampa? (2, Informative)

midom (535130) | more than 6 years ago | (#23922437)

add power costs, difficulty to travel to, possible flooding, etc. it is all historic reasons, we can't just migrate datacenters at wish - that requires quite a high investment. and the datacenter choice was simply because the founder lived there in 2001, when all we needed was single server. --Domas

Works great because it's not "Web 2.0" (5, Insightful)

Animats (122034) | more than 6 years ago | (#23921009)

Most of Wikipedia is a collection of static pages. Most users of Wikipedia are just reading the latest version of an article, to which they were taken by a non-Wikipedia search engine. So all Wikipedia has to do for them is serve a static page. No database work or page generation is required.

Older revisions of pages come from the database, as do the versions one sees during editing and previewing, the history information, and such. Those operations involve the MySQL databases. There are only about 10-20 updates per second taking place in the editing end of the system. When a page is updated, static copies are propagated out to the static page servers after a few tens of seconds.

Article editing is a check-out/check in system. When you start editing a page, you get a version token, and when you update the page, the token has to match the latest revision or you get an edit conflict. It's all standard form requests; there's no need for frantic XMLHttpRequest processing while you're working on a page.

Because there are no ads, there's no overhead associated with inserting variable ad info into the pages. No need for ad rotators, ad trackers, "beacons" or similar overhead.

Re:Works great because it's not "Web 2.0" (0)

Anonymous Coward | more than 6 years ago | (#23921201)

+1 Insightful. I would do the mod myself, if I could just find those darn mod points I had last week...

Re:Works great because it's not "Web 2.0" (1)

internic (453511) | more than 6 years ago | (#23921595)

Oh really? Because O'Reill seems to think it is [oreillynet.com] , and I thought he was the main pusher of this terminology. Is the term Web 2.0 actually meaningful?

Re:Works great because it's not "Web 2.0" (2, Informative)

Tweenk (1274968) | more than 6 years ago | (#23922051)

If you haven't noticed, "Web 2.0" is a long estabilished buzzword [wikipedia.org] - which means it carries little meaning, but it looks good in advertising. Just like "information superhighway", "enterprise feature" or "user friendly".

Re:Works great because it's not "Web 2.0" (1, Informative)

Anonymous Coward | more than 6 years ago | (#23921703)

There is practically no such thing as a static page in Wikipedia. We're running 2 small Wikipedia mirror clusters, and It's quite obvious that if you don't run a memcached along with the apache, that all pages are rendered from the Database on demand and for every single request. Large and complex pages (e.g. on Hydrogen or Gold) take more than 1 second to render even on the fastest CPUs available.
You make things sound cheap and simple, but without the memcached and the squid clusters Wikipedia is using, the whole thing would require significantly more hardware than the foundation could afford.

Re:Works great because it's not "Web 2.0" (0)

Anonymous Coward | more than 6 years ago | (#23921753)

The search box is using XMLHttpRequest for a context-sensitive combo dropdown now.

so what's "Web 2.0"? (1)

gbjbaanb (229885) | more than 6 years ago | (#23921783)

I take it that "Works great because it's not "Web 2.0" " means its fast and dynamic, whereas Web 2.0 generally means slow and dynamic.

The technology behind it is irrelevant, if content is provided by users then its web 2.0 (as I understan the term), so Wikipedia definitely is web 2.0, its just that they have some fancy caching mechanism to get the best of both worlds. If only more systems were built in a pragmatic way instead of worrying about what its "supposed" to be.

Nonsense. Wikipedia is THE web 2.0 (4, Insightful)

Nicolas MONNET (4727) | more than 6 years ago | (#23921799)

Web 2.0 is not just about flashy Ajax or what not, it's about user generated dynamic content. WP's "everything is a wiki" architecture might /look/ a bit archaic compared to fancy schmancy dynamic rotating animated gradient-filled forums, but it's much more powerful.
Moreover, WP is not a collection of static pages, if you're logged in at least, every pages is dynamically generated, and every page's history is updated within a few seconds.

I hate web sites that are broken on purpose! (0)

Anonymous Coward | more than 6 years ago | (#23921025)

How the hell are we supposed to read the text with an ad hiding the text? What idiot decided that it was a good decision to go to the hard work to create content only to hide it?

Confused by the title (5, Insightful)

Just Some Guy (3352) | more than 6 years ago | (#23921179)

What does "Non-Profit Budget" mean, anyway? There are non-profits bigger than the company I work for. Non-profit isn't the same as poorly financed.

Re:Confused by the title (1)

perbu (624267) | more than 6 years ago | (#23921885)

I guess tt means thats there the budget is not necessarily scaled in the same way it might have been if they where a commercial company. In a commercial company more traffic means more money - not so for WP.

Link to wikipedia? (4, Funny)

Luyseyal (3154) | more than 6 years ago | (#23921229)

The summary was wrong to include a link to the Wikipedia homepage without a Wikipedia link about Wikipedia [wikipedia.org] in case you don't know what Wikipedia is. I myself had to Google Wikipedia to find out what Wikipedia was so I am providing the Wikipedia link about Wikipedia in case others were likewise in the dark regarding Wikipedia.

-l

P.s., Wikipedia.

Re:Link to wikipedia? (1)

Chaotic Spyder (896445) | more than 6 years ago | (#23922023)

you must be new here

Re:Link to wikipedia? (4, Funny)

hansamurai (907719) | more than 6 years ago | (#23922305)

Wait, what's this Google thing you're talking about?

Distributed computing? (1)

Bombula (670389) | more than 6 years ago | (#23921257)

I'm kind of surprised there's not been more talk about a distributed computing effort for wikipedia. Seems like it would be a good candidate. I'm more of an honorary geek than an actual hardcore tech-savvy person - does anyone know if a distributed computing effort could work? I don't really see any problem with data integrity, since it's not confidential and is open to editing by definition (except maybe user info?), so it'd basically be a big assymetric RAID, right? I would worry more about it having fast enough response times - but maybe even that isn't so much of an issue given the nature of wikipedia's content. I suppose syncing the data as it gets edited would be the biggest issue... But what do I know?

Thoughts, everyone?

Re:Distributed computing? (1)

Tweenk (1274968) | more than 6 years ago | (#23922309)

The problem is that there's not much to compute at Wikipedia. The limiting factor is bandwidth. A distributed web cache like Coral Cache [coralcdn.org] might work, but this generally isn't called distributed computing, just like P2P networks aren't. The main problem would be that web caches have high update latency, but probably it wouldn't matter too much on Wikipedia.

Cached on servers all over the interweb? (1)

ClarisseMcClellan (1286192) | more than 6 years ago | (#23921393)

In the early days of the WWW the idea with popular pages was that they could be cached all over the internet. Your server checks with their server and if it has the page in cache already then that is what gets served up. What happen to that idea and why cannot Wikipedia work like that with only obscure and new pages getting served up from Florida?
Those 300 servers are one of the wonders of the world and if you have never made an edit then you should. There must be something you can add to the whole.
There has been much talk of other encyclopaedias but I am still waiting.

Re:Cached on servers all over the interweb? (1)

IamTheRealMike (537420) | more than 6 years ago | (#23922087)

Lots of ISPs run transparent caching proxy servers so wikipedia could be cached if they wanted. They set their headers to prevent that though, presumably so changes show up immediately.

It's called proxy server. (1)

Tweenk (1274968) | more than 6 years ago | (#23922489)

In the early days of the WWW the idea with popular pages was that they could be cached all over the internet. Your server checks with their server and if it has the page in cache already then that is what gets served up.
This is called "proxy server". Ask your ISP whether they have one. By taking this a bit further where multiple proxies can exchange data directly we have a distributed Web cache. See www.coralcdn.org [coralcdn.org] for an example of that. It works on Wikipedia pages too.

Servers and locations (2, Informative)

Anonymous Coward | more than 6 years ago | (#23922091)

According to http://meta.wikimedia.org/wiki/Wikimedia_servers [wikimedia.org] Wikimedia (and by extension, Wikipedia):

"About 300 machines in Florida, 26 in Amsterdam, 23 in Yahoo!'s Korean hosting facility."

also: http://meta.wikimedia.org/wiki/Wikimedia_partners_and_hosts [wikimedia.org]

moral: it's easy to be third rate (0)

Anonymous Coward | more than 6 years ago | (#23922181)

1. Millions of static pages can be served at a very high rate from a single modern server.

2. Editing is basically (a) get token (b) edit page (c) submit revisions with token (d) hope you didn't conflict with someone else's edits [wikipedia.org] , in which case you've got to manually fix things.

3. Lack of in-order human oversight. Wikipedia is powered by a gaggle of zealots, not organised humans, and the rule is "latest change produces current page". That's way more easy to implement than a system which involves some sort of review process.

4. Wikipedia operates like a religion with volunteer ministers and one charismatic leader. To paraphrase Bush, it's a whole lot easier to run a group when there's just one dictator and everyone's working toward his whims. "Lowest common denominator fits all" is very easy to engineer but rarely produces progress.

5. Because Wikipedia is operated as a religion rather than a business or charity, no-one gets hurt (except the charismatic leader) if there's data loss or failure, and volunteers are very tolerant of what they're given. It's unnecessary to implement the kind of safeguards to financial loss that any site of Wikipedia's site would normally have to implement.

In other news, a modern desktop can have n people logged in simultaneously typing `less ObjectivismIsAboutFreeWorkers.txt' while another n/100 are in the middle of `vi ObjectivismIsAboutFreeWorkers.txt'.

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...