I'm not going to address the morality of the war today. It's a huge topic, and a can of worms I simply can't open. What I would like to chat about is dealing with traffic spikes on a major website. that is an area I'm pretty familiar with.I'm not going to address the morality of the war today. It's a huge topic, and a can of worms I simply can't open. What I would like to chat about is dealing with traffic spikes on a major website. that is an area I'm pretty familiar with.
Last night I posted our largest comment grabbing story ever. Currently it has over 3200 comments. It was posted at 10:05 EST, which means it was off peak traffic hours. This is almost identical to the story we posted following the Columbia explosion- thousands of comments on a single story, but occuring during off hours.
Last night our traffic was approximately the same as during a normal afternoon. THat is to say 40-50 pages a second. We managed to hold up just fine. In fact, much better than during the Columbia story. This is largely because we have more servers in the cluster, and we ended up transferring one server from our SSL cluster (currently in testing) to the pool. This caused the average load in the comment cluster pool to drop in half.
In preperation for more wartime coverage, we've made a few changes. One was to remove the Next/Prev links from article.pl. Those are relatively expensive DB calls, and when more users view articles, those 2 queries per article.pl add up. Someday we'll optimize them better, but they are actually quite tricky to do properly since next/prev are relative to the user. Any number of things affect them (Subscribers see stories in the future for example).
We're also going to move the AC default threshold to 2. Logged in users won't be affected, and ACs can always drop it if they want, but this means they'll be more likely to see better comments, and hopefully smaller pages and fewer clicks.
Another change we're considering is the commentsplits. Currently we split pages on 100 comments. We're considering dropping that number to 50 or something. The theory is that more-but-smaller pages will result in snappier performance overall for everyone.
Of course the obvious answer is more metal. We're also trying to see if we can't scrounge up more boxes for the comments pool. If we get a 30-40% boost in traffic, it would be nice to have at least a 10-20% increase in hardware powering it.
The good news is that because of how we now divide our traffic between smaller, more focused and optimized clusters, huge comments shouldn't affect pages (much) like the Index or static articles. It's pretty clever and Krow should be proud since he championed this division against my better judgement. It really works quite well- even when comments start bogging down, the index still serves relatively snappy. Since more than half of our page loads are index pages, that means only good stuff.
CNN crapped out last night, but we held up just fine.
The last thing we're discussing is logical ways to split discussions. Slash is fine with stories until we get above 1500 or so comments, and lots of people are using them actively simultaneously. If we could split discussions somehow, without making the UI to intrusive for readers, that may ultimately be the kludge that gets us through the next traffic bursts. 3 stories with 1k comments in them would ultimately be much faster than one story with 3k comments.
Longer term of course we plan to optimize around this so that we don't have so much concern about sizes of discussions. It used to be that 500 was the breaking point for discussions. Then a thousand. Now we need 1500 during prime time for things to really break down. So we're getting better with age...
Anyway, the next few days look to be one of those days where it takes everything you've got to survive. I suspect all of Slashteam will be losing a few winks, so thanks to krow, jamie, pudge, cowboyneal in advance.