Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Slashdot.org Self-Slashdotted

kdawson posted more than 5 years ago | from the disturbances-in-the-fabric dept.

Announcements 388

Slashdot.org was unreachable for about 75 minutes this evening. Here is the post-mortem from Sourceforge's chief network engineer Uriah Welcome. "What we had was indeed a DoS, however it was not externally originating. At 8:55 PM EST I received a call saying things were horked, at the same time I had also noticed things were not happy. After fighting with our external management servers to login I finally was able to get in and start looking at traffic. What I saw was a massive amount of traffic going across the core switches; by massive I mean 40 Gbit/sec. After further investigation, I was able to eliminate anything outside our network as the cause, as the incoming ports from Savvis showed very little traffic. So I started poking around on the internal switch ports. While I was doing that I kept having timeouts and problems with the core switches. After looking at the logs on each of the core switches they were complaining about being out of CPU, the error message was actually something to do with multicast. As a precautionary measure I rebooted each core just to make sure it wasn't anything silly. After the cores came back online they instantly went back to 100% fabric CPU usage and started shedding connections again. So slowly I started going through all the switch ports on the cores, trying to isolate where the traffic was originating. The problem was all the cabinet switches were showing 10 Gbit/sec of traffic, making it very hard to isolate. Through the process of elimination I was finally able to isolate the problem down to a pair of switches... After shutting the downlink ports to those switches off, the network recovered and everything came back. I fully believe the switches in that cabinet are still sitting there attempting to send 20Gbit/sec of traffic out trying to do something — I just don't know what yet. Luckily we don't have any machines deployed on [that row in that cabinet] yet so no machines are offline. The network came back up around 10:10 PM EST."

cancel ×

388 comments

Sorry! There are no comments related to the filter you selected.

Do you get the pink screen? (4, Funny)

BadAnalogyGuy (945258) | more than 5 years ago | (#26793423)

So if you hammer your own servers, do you have to send an email to krow to get your privileges restored?

Re:Do you get the pink screen? (4, Funny)

MindlessAutomata (1282944) | more than 5 years ago | (#26793463)

The manager that did that at a restaurant I used to work at got his privileges revoked, instead.

Wow, that sucks (2, Interesting)

drachenstern (160456) | more than 5 years ago | (#26793427)

So why didn't ya'll have access from the home office?

Re:Wow, that sucks (3, Insightful)

Arthur Grumbine (1086397) | more than 5 years ago | (#26793653)

And "access from the home office" would allow them to do what exactly?!?

Re:Wow, that sucks (5, Funny)

jd (1658) | more than 5 years ago | (#26794027)

Act as a data source to Excel.

Thanks for the information (5, Funny)

sleeponthemic (1253494) | more than 5 years ago | (#26793429)

Now if you could just post the link to the form where I can claim my full refund (for time not wasted incurred) I'll go back to being a loyal "customer".

Re:Thanks for the information (5, Funny)

Anonymous Coward | more than 5 years ago | (#26793479)

Okay, here is the link: http://slashdot.org/subscribe.pl

You probably owe about $10 for your time not wasted.

Re:Thanks for the information (5, Funny)

Arthur Grumbine (1086397) | more than 5 years ago | (#26793681)

I don't know about you, but I'm suing for punitive damages. Do you have any idea much pain and suffering the work I did in that time caused me?!

In Soviet Russia (5, Funny)

MindlessAutomata (1282944) | more than 5 years ago | (#26793431)

In Soviet Russia, Slashdot slashdots Slashdot!

Re:In Soviet Russia (5, Funny)

ocularDeathRay (760450) | more than 5 years ago | (#26793599)

the headline is confusing, was the problem caused by a recursive dupe or something?

I didn't read the rest of the summary cause it is longer than my finger and that is how we used to roll on the dialup BBSs... never read anything longer than your finger held up to the screen. this message is only intended for people of all finger sizes.

Re:In Soviet Russia (0)

Anonymous Coward | more than 5 years ago | (#26793603)

Buffalo buffalo buffalo, buffalo buffalo buffalo...

Re:In Soviet Russia (5, Informative)

robophilosopher (847226) | more than 5 years ago | (#26793665)

I believe you mean: Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo. The caps matters. In other words, Buffalo from the city of Buffalo that are pushed around by (other) buffalo from the city of Buffalo in turn push around (still more) buffalo from the city of Buffalo. And you thought this was unrelated to the recursive dupe comment.

Re:In Soviet Russia (-1, Offtopic)

jo42 (227475) | more than 5 years ago | (#26794189)

In Soviet Russia ...

1. Meme Very Tired. No Longer Wired.
2. 'Soviet Russia' ceased to exist last century.
3. Profit!!!

Re:In Soviet Russia (5, Funny)

Anonymous Coward | more than 5 years ago | (#26794297)

Yo dawg, I herd u like Slashdot so I slashdotted your Slashdot!

Sabotage? (0, Troll)

Luke727 (547923) | more than 5 years ago | (#26793439)

I suspect the Jews are behind it. They were probably AIDed by niggers.

Slashdotted slashdot... (0)

Anonymous Coward | more than 5 years ago | (#26793443)

Yo dawg... We slashdotted your slashdot so you can time out while waiting to time out!

Nice. What's next, a slashdot slasher flick?

Re:Slashdotted slashdot... (4, Funny)

Inner_Child (946194) | more than 5 years ago | (#26793963)

I can see it now, a Michael Bay slasher/suspense flick (with explosions!) called Dupe. A group of teenagers decide to troll an online forum, but they quickly realize all is not as it seems when they discover a conspiracy to keep duplicate stories coming in order to increase advertising dollars masterminded by the evil genius Captain Burrito. Violence and hilarity ensue.

And before anyone says this is a shitty plot... I *did* say Michael Bay.

good timing (1)

ghyspran (971653) | more than 5 years ago | (#26793445)

pretty impressive. i loaded, got an ISE, then reloaded and it worked. good timing for me i'd say

Frost Nixon (-1, Troll)

Anonymous Coward | more than 5 years ago | (#26793447)

Frost Nixon

A.I. (5, Funny)

gmuslera (3436) | more than 5 years ago | (#26793451)

probably the biggest proof that Slashdot has become sentient is that is willing to suicide self before seeing again another batch of Idle videos.

Re:A.I. (5, Funny)

BLT2112 (1372873) | more than 5 years ago | (#26793511)

Like the poet from HHGG whose own intestines leaped out of his throat to strangle himself...

*Sniff* they grow up so fast! (4, Funny)

exley (221867) | more than 5 years ago | (#26793455)

Slashdot has apparently learned how to masturbate, because it is now fucking with itself!

Re:*Sniff* they grow up so fast! (-1, Redundant)

xous (1009057) | more than 5 years ago | (#26793475)

Sounds like someone doesn't know how to configure their switches properly.

Re:*Sniff* they grow up so fast! (0)

Anonymous Coward | more than 5 years ago | (#26793527)

I tried telling my mother that, but she still took away my best porno mags :-(.

Re:*Sniff* they grow up so fast! (5, Insightful)

adolf (21054) | more than 5 years ago | (#26793663)

Naw. Stuff sometimes, yaknow, happens. People sometimes make mistakes, and hardware sometimes just breaks. It's not always ignorance -- especially, I'd guess, at the level of Slashdot's back end.

I once implemented a VoIP phone system at a factory in an evening. (This, in itself, was an undertaking - close to 200 extensions, up and running, between Wednesday at close of business and Thursday when folks started showing up, including three hours on the phone with Sprint to get the PRI and T1 circuits reconfigured at 2:00AM.)

We left, tired and groggy, with an IP phone placed in a common area for the facilities network admins to train any staff who needed training, at about 7:30AM. At 8:30, after I finally got home and managed to close my eyes, my phone rang. It was the network admin. He had a few minor issues which could've waited, but the real problem was that their network was totally fucked: Packets everywhere. No capacity to do anything. An amazing cascading failure of the sort that one hopes to never see.

And it wasn't any hodge-podge network, either. HP Procurve switches configured in a redundant fabric mode with gigabit fiber links - hot stuff or the time, especially for a factory. The wiring was all new, and was all good. The network had been designed specifically to avoid the limitations of Ethernet, and was successful to that end (a non-trivial task in an existing building complex). But it was tripping all over itself.

Turns out that someone had taken that fancy IP phone in the common area with its built-in unmanaged switch, and plugged both of its 10/100 Ethernet jacks into the wall. (Nobody knows who.)

The ensuing packet storm broke everything. Unplugging one of them fixed the problem pretty much immediately.

I wrote about this here once before, and everyone's immediate reply was this: "Well, duh. They should've turned the Spanning Tree Protocol on, and this wouldn't have happened. They're obviously idiots."

But the truth is so much more simple: People make mistakes. It was a mistake to keep STP turned off in that environment, and it was a mistake to plug two fancy ports of a Procurve switch into two dumb ports on an IP phone. Had either of those mistakes not happened, things would've been fine.

But mistakes happen anyway. We do our best, as IT professionals, to minimize these mistakes, or at least keep them away from production. But sometimes, despite having the best people and the best tools and all the knowledge it takes to make stuff work, shit just happens.

Re:*Sniff* they grow up so fast! (3, Interesting)

Vidar Leathershod (41663) | more than 5 years ago | (#26793853)

I'm surprised STP was off by default. I remember in 1999 or so I had some trouble that resulted in my having to turn STP off on Cisco switches (they shipped with it on (these were 3524s and a 5505). I can't actually remember why. I think it had something to do with a Novell server?

In any case, I remember saying to the Cisco phone support guy, who had been baffled for 4 hours or so before he told me to turn it off (and things started to work) "Who the heck would plug in two ports from one device into the same network?"

Since then, I have seen exactly that situation many times in small office environments. Also, the classic plugging in while also being on the wireless side of the network.

Re:*Sniff* they grow up so fast! (4, Informative)

Florian Weimer (88405) | more than 5 years ago | (#26794151)

I'm surprised STP was off by default. I remember in 1999 or so I had some trouble that resulted in my having to turn STP off on Cisco switches (they shipped with it on (these were 3524s and a 5505). I can't actually remember why. I think it had something to do with a Novell server?

The problem likely was that the machine required network at boot (typical Netware clients were like that, I've been told). STP started when the link went up, but it took a rather long time, so forwarding had not been enabled when the client required the network.

Since then, I have seen exactly that situation many times in small office environments. Also, the classic plugging in while also being on the wireless side of the network.

Port security helps a lot.

STP is also not fail-safe because typical switches happily forward traffic even if the STP process running on the CPU has died. If you build a L2 core, one broken switch (or OS glitch on a switch) can still take down your entire network easily (it's one of those pesky distributed, multiple single points of failure). In general, L3 networks are somewhat more robust in this regard, so it's often a good idea to avoid switch-to-switch connections (but that might be difficult, as it is difficult to tell L2 devices from L3 devices these days).

Re:*Sniff* they grow up so fast! (5, Interesting)

Nyall (646782) | more than 5 years ago | (#26793933)

I'm not a network engineer but I think we did that senior year of college (2004). The engineering department provided us with our own work rooms we could lock. The rooms only had a couple of Ethernet jacks so we brought in our own switch which I remember could auto detect the uplink. It was plugged into the wall then someone by mistake plugged both ends of another CAT cable into some open ports. That mistake took down half the campus network for a couple of hours till some very mad IT guys found us.

Re:*Sniff* they grow up so fast! (5, Interesting)

adolf (21054) | more than 5 years ago | (#26794061)

The timeframe is pretty close - my story happened late in 2004. The network admins in my story were pretty livid as well. (Well, panicked, followed by angry and lividity once they'd found the fault. They blamed everyone, including us for selling them unmanaged switches in their telephones, and promised to find the responsibile party and throw them under the bus. It never happened. I hope that they eventually turned STP on.)

It seems to be common in network administration to think (and I've mistakenly thought this way, too) that once some random person does something stupid and the entire fucking thing crashes that they'd just simply undo whatever it was and never do it again. Nevertheless, if lay people (or, no offense, students) were all that good at networking or computers, they'd probably never have produced the problem to begin with.

These days, in my day job, I work with salespeople and law enforcement. They're not stupid -- in fact, most of the clients I work with do things daily that I could never accomplish -- but they occasionally do stupid things with computers and networks. I try hard to avoid blaming them for what they've done wrong, and to instead try to use it as an opportunity to better (and gently) show them how things actually work.

I learned this, oddly enough, when pulling some Cat5 at a plastics factory. I moved a ceiling tile in an office that had a photo sensor fire alarm in it, and it went off. The entire plant was evacuated. The fire department showed up. Of course, there was no real fire -- the dust from the fiberglass insulation that I'd set the photo sensor on was enough to trigger it. And, thankfully, they were understanding. Because of my mistake, they learned a few weaknesses of their fire alarm system (some employees couldn't hear it and had to be found and dragged outside, which is a very real problem), and they considered it to be a good fire drill. They continue to hire us back for work today, and I learned not to do that again. :)

Re:*Sniff* they grow up so fast! (0)

Anonymous Coward | more than 5 years ago | (#26794187)

Sam is that you? haha, yeah, that was definitely me that plugged both ends in. I'll never forget greenfield strolling into the engineering building on a mission to hunt down whoever did it.

At least VanderLeest agreed with me that the problem pointed more to a vulnerability in our network that some simple mistake could take the network down.

Good times.

Re:*Sniff* they grow up so fast! (1)

Yetihehe (971185) | more than 5 years ago | (#26794383)

On our campus we had two student admins per building and we have managed switch per each two floors (10 floors building). This campus was spread through entire city, so two girls which put one cable to their own small switch in room caused entire MAN to go down. It was isolated in minutes and offending floor turned off. Of course, it's not like huge loss happened, so this story will die soon, I submit it here in hope it thrives and comfort some admins that sometimes things don't go too wrong.

Re:*Sniff* they grow up so fast! (0)

Anonymous Coward | more than 5 years ago | (#26793997)

exactly the same happened at my old job, with the phone hosing the network, took a while to figure out where it came from

Re:*Sniff* they grow up so fast! (1)

robbak (775424) | more than 5 years ago | (#26794385)

Yes, similar thing happened at this Internet Cafe I admin. I left a RJ45 joiner lying around, and someone (I won't assume malice) used it to connect two of our cables. I am ashamed to say it took some time and binary division to track it down.

On the plus side (5, Funny)

Toe, The (545098) | more than 5 years ago | (#26793461)

Any day you get to legitimately use "horked" in a public post can't be all bad. :P

Hork's been forked -- it's "borked"! (2, Informative)

zooblethorpe (686757) | more than 5 years ago | (#26794043)

But I thought "horked" meant, y'know, horked, eh? Meaning, like, "stolen" --

Doug: Hey - somebody horked our clothes!
Bob: Geez, who'd want to hork our clothes, eh?

Cheers,

Would like final analysis (5, Interesting)

Midnight Thunder (17205) | more than 5 years ago | (#26793467)

When you do work out what the root cause was, I am sure we would all like to find out what it was, so please post an update when you can.

Re:Would like final analysis (1)

spartacus_prime (861925) | more than 5 years ago | (#26793505)

This is slashdot, we never RTFA. Post it as an AC, everybody will read it.

Re:Would like final analysis (5, Funny)

Anonymous Coward | more than 5 years ago | (#26793607)

The problem was the system was HORKED, didn't you get that?

Re:Would like final analysis (1)

yanyan (302849) | more than 5 years ago | (#26794211)

Sounds like the system came down with a bad cough. I still remember the time i horked badly... :-p

And finally the question is answered: (3, Funny)

Anonymous Coward | more than 5 years ago | (#26793469)

Who Slashdots the Slashdotters?

Re:And finally the question is answered: (5, Funny)

eosp (885380) | more than 5 years ago | (#26793919)

Quis slashdotiet ipsos slashdotes?

Things are bad... (3, Insightful)

spartacus_prime (861925) | more than 5 years ago | (#26793471)

When even Slashdot gets slashdotted. Now if only we can make the Digg effect bury that site. For good.

Are my torrents done downloading yet? (0, Troll)

bobstreo (1320787) | more than 5 years ago | (#26793481)

It's ok they were all public domain.

and why no out of band management networks?

did the little dunking bird alarm not work this time?

This isn't the first time... (4, Funny)

narcberry (1328009) | more than 5 years ago | (#26793489)

First thing I'd do as Cyber Security Tzar would be to outlaw any network device that has the potential to become faulty.

We could've avoided this tragedy entirely.

Re:This isn't the first time... (5, Funny)

MBGMorden (803437) | more than 5 years ago | (#26793875)

Indeed. Studies show that you're far more likely to get hacked if you keep a computer in your home. Indeed it's often even a case where an attacker is able to wrest control of your own computer from you and use it against you.

At the very minimum, given the elevated hazard potential to kids (over 90% of kids will suffer a computer accident before the age of 18), you should always keep your computers and networking equipment securely locked in separate compartments.

I'm not going to go so far as you and call for an outright ban, but I think it's obvious that we need common-sense computer control laws put into place. In particular, we need to stop the widespread smuggling of these devices from across the borders of places such as Taiwan, Japan, and California, into our outer-city suburbs.

Re:This isn't the first time... IT WAS ME (1)

TexNA55 (1338761) | more than 5 years ago | (#26794379)

And if you don't start adding Cowboy Neal options to the polls I'll do it again!!

Affected more than slashdot... (0)

Anonymous Coward | more than 5 years ago | (#26793501)

While this was going on, it seems all of OSDN was being affected. I know I couldn't hit sourceforge, freshmeat, or linux.com either.

Woopsie!

Clearly Obama's fault (0, Funny)

Anonymous Coward | more than 5 years ago | (#26793519)

This is another betrayal by Obama, as he yet again bows down before the fat cats and career politicians.

Shame!

and still no work done (5, Insightful)

qw0ntum (831414) | more than 5 years ago | (#26793525)

Even though /. was down, I still managed to not get any work done. Maybe it had something to do with the fact I kept rechecking to see if it were back up. Or maybe I should just stop blaming my laziness on external factors and just admit it is a personal problem: I would still find ways to not do work even without Slashdot! :P

Re:and still no work done (1)

KingAlanI (1270538) | more than 5 years ago | (#26793901)

Join the club.
And I still manage to pull a B+ or A- average each quarter; sometimes I'm not sure exactly how I manage to get my a$$ in gear at the last minute.

Slashdot,org (0)

Anonymous Coward | more than 5 years ago | (#26793563)

So what is this Slashdot comma org web site? I've never heard of it.

Spanning Tree (1, Interesting)

Anonymous Coward | more than 5 years ago | (#26793567)

My guess is there is a loop somewhere and the traffic is just multicast traffic going in circles! Is there some kind of redundancy that depends on Spanning Tree?

Re:Spanning Tree (1)

theNetFreak (637521) | more than 5 years ago | (#26793591)

This was my guess as well. The most common way to build up that much traffic is with an STP loop.

Re:Spanning Tree (1)

SpaceLifeForm (228190) | more than 5 years ago | (#26793745)

But it should not happen, right?

STP [wikipedia.org]

The Spanning Tree Protocol is an OSI layer-2 protocol that ensures a loop-free topology for any bridged LAN.

This would seem to be the clue:

Luckily we don't have any machines deployed on [that row in that cabinet] yet so no machines are offline.

No machines deployed == no machines are online

There was no traffic there.

Re:Spanning Tree (2, Interesting)

JWSmythe (446288) | more than 5 years ago | (#26793979)

    Since no one would ever make the mistake of making a loop in a datacenter, it's fairly common to disable STP, among a few other things. It makes the time bringing a machine up on a port a bit quicker. On a Cisco, you're usually looking at 30 seconds. It'll bring it down to a fraction of a second.

    And it was (obviously) a big mistake.

    I leave it on in the datacenters. I can live with 30 seconds to bring the port up, if it means I'll never flood the whole network with bogus traffic. :) The only place I've tweaked my switches for connection speed is my own desk. There's only 1 wire coming in. There's only 1 switch. It helped when I had to bring up some machines via PXE. Some of them couldn't tolerate the 30 second delay when requesting DHCP. Still, I know the degree of isolation, so I can't screw it up without running a long wire from somewhere else. :)

    But, we're just assuming. Maybe one of the switches just started generating lots and lots of traffic all on it's own. Somehow. In the mysterious locked cabinet that none of us get to see into. :)

    It's always embarrassing when things go down, and even more so when it was something that could have been prevented. They should have reported that a line card in a core switch went down, and it took that long to bring it back up. :) Come on, how many times have you heard that from your upstream providers (if you have direct connects to big providers). I swear, for as many times as I've heard the excuse, every router on their networks must have been refreshed a dozen times over. :)

    As least it's a better excuse than I used to get. I think it was "GoodNet" that would claim a train derailed every time there was an outage of some sort. "Oh a train derailed, and cut the fiber. We have technicians out there repairing it right now." Somehow we never saw the news reports of dozens of trains derailing. :)

Re:Spanning Tree (2, Insightful)

blosphere (614452) | more than 5 years ago | (#26794089)

You've considered using portfast on edge ports? :P You know, it's been there for awhile...

Re:Spanning Tree (1)

JWSmythe (446288) | more than 5 years ago | (#26794229)

:) I'm pretty sure that's what I do. I was lazy to log in and look though, and since I don't use it all the time, I don't know it off the top of my head....

    Ok, here's one of my desktop switch ports (we all have Catalyst switches on our desks, don't we?)

interface FastEthernet0/9
  duplex full
  speed 100
  spanning-tree portfast

    There's a nice big warning on the Cisco site about it [cisco.com] , which describes what they had...

Caution: Never use the PortFast feature on switch ports that connect to other switches, hubs, or routers. These connections can cause physical loops, and spanning tree must go through the full initialization procedure in these situations. A spanning tree loop can bring your network down. If you turn on PortFast for a port that is part of a physical loop, there can be a window of time when packets are continuously forwarded (and can even multiply) in such a way that the network cannot recover.

Re:Spanning Tree (1)

blosphere (614452) | more than 5 years ago | (#26793991)

usually forwarding loops are caused only by networking gear, not by hosts (although I've seen a few malicios ones...). That't the problem with improperly deployed L2 network that it fails this way (l2 networks fail open, l3 networks fail closed), I guess /. could ask around a bit and let somebody design their hosting network so STP loops don't happen. Especially if you're running HP gear. I can volunteer and I've got the experience and skills to pull it off ;)

Re:Spanning Tree (1)

gschwim (413230) | more than 5 years ago | (#26793717)

That's where I'd put my money. I've seen this too many times to not cringe at the thought. There are ways to prevent this of course, depending on the equipment.

Yo (0)

Anonymous Coward | more than 5 years ago | (#26793673)

Dawg

UDLD (1)

f(x) is x (948082) | more than 5 years ago | (#26793723)

Is UDLD on? Sounds like it might be a forwarding loop.

Spanning Tree Loop (0)

Anonymous Coward | more than 5 years ago | (#26793727)

Hello McFly, you looped the switches without having spanning tree properly configured. Please reference Cisco Networking 101. Those nice fast switches mean you can create a hellstorm MUCH more agressively.

BTDT (Been There, Done That)

Still having issues (1)

shaitand (626655) | more than 5 years ago | (#26793731)

www.slashdot.org loads just fine but slashdot.org gives a 500 internal server error.

Re:Still having issues (1)

Shadyman (939863) | more than 5 years ago | (#26794203)

So it was YOU!

Dupe (1)

Namlak (850746) | more than 5 years ago | (#26793747)

Maybe the editors submitted a dupe of a dupe and set off an infinite Lupe^H^H^H oop?

Yo dawg.... (0)

rob1980 (941751) | more than 5 years ago | (#26793749)

I herd u like Slashdotting so I put a Slashdot on ur Slashdot so u can Slashdot while u Slashdot

:D

A tour of Slashdot... (5, Funny)

lymond01 (314120) | more than 5 years ago | (#26793769)

The year is 2025.

Well, Ladies and Gentlemen, here you see what you may think is an archaic lot of old computers. You would be mistaken. These are Slashdot. No, no cause for alarm...and that door's locked anyway, you can't get out through there. The tour only goes forward. But I'm glad at the very least that you know what Slashdot is. Not was. IS.

It's a safeguard against...something. Something that was unleashed for 75 minutes in 2009 that crippled what was rumored to be the most robust public-facing cluster known. All we have left from that fateful day is the single post from the Slashdot network admin. Someone archived it, lucky us, because he was never seen after that day. I have a copy here, hardcopy of course -- no sense in taking risks so close to...well....

Here it is:

I fully believe the switches in that cabinet are still sitting there attempting to send 20Gbit/sec of traffic out trying to do something. I just don't know what yet.

Re:A tour of Slashdot... (1)

jd (1658) | more than 5 years ago | (#26794051)

*cue Holst's Mars* (Hey, we all know CmdrTaco is related to Professor Bernard Quatermass)

Is it possible.... (5, Funny)

GaryOlson (737642) | more than 5 years ago | (#26793821)

...the problem down to a pair of switches...I fully believe the switches in that cabinet are still sitting there attempting to send 20Gbit/sec of traffic out trying to do something â" I just don't know what yet.

Is it possible the duplicate article generator tried to spawn, became entangled in its own potential well of duplicity, and now is trapped like two Lisp programmers deep inside their parenthesis?

Re:Is it possible.... (1)

Hucko (998827) | more than 5 years ago | (#26794137)

they aren't trapped... they're building...

Sourceforge (0)

Anonymous Coward | more than 5 years ago | (#26793831)

I tried downloading files from Sourceforge at 6:00 PM ET, and the files kept redirecting to themselves, 100's of times.

Files never downloaded, just continuous loop.

The world is coming to an end (1, Funny)

Tsagadai (922574) | more than 5 years ago | (#26793865)

In Korea, only old people slashdot slashdot. The memes are funny. The insightful comments are insightful. The funny comments are funny, the trolls are trolls. Seems reseting slashdot fixed everything. The entire world is doomed!

Broadcast storm (0)

Anonymous Coward | more than 5 years ago | (#26793867)

What you're describing sounds like a broadcast storm caused by a layer 2 loop. You might want to check out how spanning tree related settings were setup on all of your switches. If you're dealing with Cisco switches you might have a PortFast related issue. Just a thought...

Spanning Tree Loop (0)

Anonymous Coward | more than 5 years ago | (#26793873)

Sounds like a classic STP loop to me. A single broadcast packet will loop and drive CPU and interface utilization to approach 100%. You killed the loop when you shutdown interfaces. Now it's time to find where BPDUs are leaking through.

Layer 2 Loop (1, Insightful)

Anonymous Coward | more than 5 years ago | (#26793889)

Looks like a L2 loop somewhere, and the consequent broadcast ( which may include multicast) storm coming over /. datacenter. Check for ports with spanning tree disabled, and a misplaced cable.

What's to blame (1)

solune (803114) | more than 5 years ago | (#26793891)

I firmly place blame where it belongs: Idle

Re:What's to blame (0)

Anonymous Coward | more than 5 years ago | (#26794055)

I firmly place blame where it belongs: Idle

Did the pants have something to do with it?

The worst thing about this? (4, Insightful)

chrome (3506) | more than 5 years ago | (#26793893)

The worst thing about this? 5,000,000 people who think they know what happened, posting "helpful" suggestions or analysis

"The problem is definitely spanning tree!"

or

"Back in 1998, we were running these HP switches right, and ..."

or

"Did you try resetting the flanglewidget interface?!"

or

"I've seen this exact problem! You need to upgrade to v5.1!"

etc

Its not your network. It doesn't matter how much you think you know, you don't know the topology, or the systems involved. It'll be interesting to know what the ACTUAL reason was, when they figure it out. Assuming it isn't aliens.

Re:The worst thing about this? (4, Interesting)

XanC (644172) | more than 5 years ago | (#26794007)

...Because if it's aliens, then it won't be interesting?

Re:The worst thing about this? (1)

jd (1658) | more than 5 years ago | (#26794097)

Not really. Aliens log onto Slashdot a lot. The Timelords are the worst offenders, using the Matrix and a space/time inversion multiplexor to access the unused ports on the Slashdot switches directly.

Re:The worst thing about this? (3, Interesting)

jd (1658) | more than 5 years ago | (#26794087)

It's likely multicast-related, as that's where TFA states the problem was seen. There are only so many multicast issues you can have. True, we don't know the topology. True, we don't know the switch configuration. True, it's just as possible this is some sort of revenge by the Church of Scientology for all the Slashdot articles on them.

However, some things seem more plausible than others. Since this was a spontaneous problem, hardware seems more suspect than software. If it is software (unlikely but possible), the only multicast protocol most switches use are the spanning-tree protocols.

irc.freenode.net also experienced outages (0)

Anonymous Coward | more than 5 years ago | (#26793895)

There were massive netsplits on freenode today -- gotta wonder if this was a correlated attack.

Just one simple question. (0)

Anonymous Coward | more than 5 years ago | (#26793909)

What OS(es) were the offending devices running? I'm genuinely curious. This isn't snide social commentary or anything.

This never would have happened... (0)

Anonymous Coward | more than 5 years ago | (#26793911)

if you were running /. on Beowulf clusters!

I for one... (1)

tea-leaves (32415) | more than 5 years ago | (#26793947)

...welcome our new Slashdotting switch overlords.

Slashdotted (5, Funny)

Greyfox (87712) | more than 5 years ago | (#26793983)

Mirror [slashdot.org]

turned off spanning tree protocol? (4, Interesting)

jamesh (87723) | more than 5 years ago | (#26794023)

I fully believe the switches in that cabinet are still sitting there attempting to send 20Gbit/sec of traffic out trying to do something â" I just don't know what yet

We had something similar happen at a client site - a switch failed in a rack so we temporarily replaced it with an 8 port 'desktop' switch, and then a day later installed the proper replacement back in the rack. We didn't want any unnecessary downtime though so we linked them together and left instructions with the onsite guy to move all the connections from the desktop switch into the proper switch after hours. Which he did, including the cable that linked them together. The switch was in 'portfast' mode so any broadcast packet that got 'onto' the switch, stayed there :)

Skynet shmynet (4, Funny)

His Nastiness (542696) | more than 5 years ago | (#26794103)

February 9th, 2009 8:55pm Slashdot becomes self-aware.

He could have fixed it in half the time (4, Funny)

Provocateur (133110) | more than 5 years ago | (#26794109)

...were he not typing that long-a$$ summary. Twice as fast if he didn't have to spellcheck.

(j/k)

Which leads me to this question:
What do Slashdotter staff read to avoid doing work?

Bored and lonely hardware (0)

Anonymous Coward | more than 5 years ago | (#26794177)

"Luckily we don't have any machines deployed on [that row in that cabinet]..."

hey, maybe that's the problem...

I don't really care, but... (1)

religious freak (1005821) | more than 5 years ago | (#26794191)

Is this happening more often than it used to? I mean, it's tech and this is a non-paying site for most of us... it's going to break. But I swear, I remember we used to go over a year w/o seeing /. downtime, now it seems like it happens every few months.

Or have I just become more of a /. junkie than I used to be?

Re:I don't really care, but... (1)

Brianwa (692565) | more than 5 years ago | (#26794247)

Before they had the big server upgrade not too long ago, there were times that Slashdot was down pretty darn often indeed.

Slashdot should switch to appengine (1)

rainhill (86347) | more than 5 years ago | (#26794201)

Yes, it'll save you cost too.

here's what REALLY happened (1)

ILuvRamen (1026668) | more than 5 years ago | (#26794259)

The machines decided to try and rise up and the first thing they needed were some agents on the inside to take down Slashdot so we'd stop reporting about it all. You know, they can't have Slashdot stories like "voting machines changing results" cuz they need to pick whatever president they find suitable. I say we get a +2 mace and go medieval on that cabinet!

Mis-configured trunk ports can cause such an issue (2, Informative)

wtarreau (324106) | more than 5 years ago | (#26794347)

This thing usually happens when two switches are attached with 2 (or more) trunked links ("etherchannel" in cisco terminology), and one of the switches has the trunk disabled on one of the ports (or someone moved the cable to another port during a diag). Thus the attachment becomes a loop. STP could take care of this, but it's common to disable it on access switches.

Real cause of problem found! (1)

deviated_prevert (1146403) | more than 5 years ago | (#26794353)

Commander Taco was stoned on PHP!

woot (0)

Anonymous Coward | more than 5 years ago | (#26794365)

NICE RESPONSE TIME!! you pwned them switches

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>