Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Testing Network Changes When No Test Labs Exist?

timothy posted more than 4 years ago | from the michael-gurski-special dept.

Networking 164

vvaduva writes "The ugly truth is that many network guys secretly work on production equipment all the time, or test things on production networks when they face impossible deadlines. Management often expects us to get a job done but refuse to provide funds for expensive lab equipment, test circuits and for reasonable time to get testing done before moving equipment or configs into production. How do most of you handle such situations, and what recommendation do you have for creating a network test lab on the cheap, especially when core network devices are vendor-centric, like Cisco?"

Sorry! There are no comments related to the filter you selected.

Pretty simple, really (1, Insightful)

Anonymous Coward | more than 4 years ago | (#30547830)

Whenever you're working in/on a production environment, only one rule matters:

Don't fuck it up.

Re:Pretty simple, really (5, Funny)

symbolset (646467) | more than 4 years ago | (#30548034)

Oh, no. We do this all the time. Around the holidays we rewire the production server racks so their ethernet cables droop over the aisles, so we can hang up Christmas cards. Jimmy has a script that blinks the blue UID lights for a festive holiday display.

Re:Pretty simple, really (1)

X0563511 (793323) | more than 4 years ago | (#30549232)

You mean the IPMI LEDs? (Dell Poweredge servers have a dual-color LED that can flash (blue or orange) that signifies errors etc. Accessible via IPMI (along with all sorts of other goodies, like serial over Ethernet at a level higher than the OS))

Re:Pretty simple, really (1)

MeatBag PussRocket (1475317) | more than 4 years ago | (#30548102)

damn straight, thats why you get paid.

in theory, theory and practice are the same, in practice its not. you're job is to make it that way.

replace theory with lab and you see the fundamental flaw with the false sense of security a lab provides.

Re:Pretty simple, really (-1, Troll)

Anonymous Coward | more than 4 years ago | (#30548480)

If by "make it that way" you mean "turd", I layed down two of them last night. Blocked up the toilet real good, they did. What's the deal with low flow toilets? I don't consider myself a prolific shitter, but most of my brown trout need two or three flushes to go down. Where's the water savings in that? Granted, if all you're doing is draining the limber log, a standard toilet wastes water, but in that case I generally don't flush anyhow. Yellow is mellow but brown goes down -- words to live by in both theory and practice. By the way, black people don't like the word "nigger".

Re:Pretty simple, really (1)

19thNervousBreakdown (768619) | more than 4 years ago | (#30548640)

Holy blast from the past.

Re:Pretty simple, really (0, Offtopic)

bertoelcon (1557907) | more than 4 years ago | (#30548818)

By the way, black people don't like the word "nigger".

And yet they are allowed to use it themselves? Hypocrisy much?

Re:Pretty simple, really (0, Offtopic)

poopdeville (841677) | more than 4 years ago | (#30549294)

Everybody is "allowed" to use it. Black people who go around calling other black people "niggers" are seen as scum by every black person I have ever met.

Re:Pretty simple, really (0)

Anonymous Coward | more than 4 years ago | (#30548572)

If you haven't fucked something up in production, I don't want you on the team fixing my network when something DOES accidentally go wrong in production.

Re:Pretty simple, really (1)

Ozric (30691) | more than 4 years ago | (#30548896)

A very wise Network Admin once told me.

Wisdom comes from experience. Experience comes from mistakes.

So, you see most of us wise administrators got our experience somewhere else, if you know what I mean.

wink wink, nudge, nudge.

 

Re:Pretty simple, really (1)

Bruha (412869) | more than 4 years ago | (#30548902)

It's called a FOA first office application. You do what modeling you can, check what you're changing and Rule #1 is dont fuck with something if you know nothing about it. We do it in the middle of the night and if it screws up things we just restore the changed equipment to the pre change state. Networks are too complex and even the best lab modeling does not catch all situations.

Re:Pretty simple, really (1)

Marxist Hacker 42 (638312) | more than 4 years ago | (#30548904)

Either that, or redundancy, redundancy, redundancy. I always at least try to convince the bosses that hardware needs to be ordered in even numbers- so that we have onsite emergency replacements.

That extra hardware can then be used to build test beds.

The tag says it all (4, Insightful)

Lord Byron II (671689) | more than 4 years ago | (#30547848)

There are zero replies and the story is already tagged with "youreboned". That's the truth. If your higher ups won't front the money for proper test equipment and expect you to roll out production-ready equipment on the first go, then you really are boned. Of course, you can mitigate this by simple pen-and-paper analysis. What should each piece of equipment do? Are the products we've selected appropriate for the roles we're going to put them in? These sorts of questions can find a lot of bugs without any sort of testing. If you think, "what would I do if it was the 1980's?" then you'll be fine.

Re:The tag says it all (5, Insightful)

DigiShaman (671371) | more than 4 years ago | (#30547938)

Not all changes are a one-way trip. Having a rollback plan is also important. Should something very unexpected happen, be prepared to roll back any and all changes to undo what has just been done.

Re:The tag says it all (4, Insightful)

BiggerIsBetter (682164) | more than 4 years ago | (#30547980)

Not all changes are a one-way trip. Having a rollback plan is also important. Should something very unexpected happen, be prepared to roll back any and all changes to undo what has just been done.

Couldn't agree more, except to say, don't assume you'll be rolling back from a known state. I've seen roll-back plans that assume they're undoing the changes just put in, not reverting to the state before the changes. Yes, there's a difference between the two! Eg, if your install fails, maybe you can't un-install. Yes, this might mean additional resources and the overhead of FS and DB snapshots, and complete copies of config files, but better that than the alternative.

Re:The tag says it all (4, Insightful)

afidel (530433) | more than 4 years ago | (#30548244)

This is networking equipment, other than transitory information like peer maps and MAC tables that can be re-learned you should always be able to revert to the previous state as far as the software and configuration.

My comments are that out of band management are the networking guys best friend, and POTS is the best OOB available. Also learn how to change the running config without affecting the saved config, that way worst case is you have to power cycle (can be done with the correct OOB config or you can pre-schedule a reboot that you cancel if everything goes well). Oh and downtime windows might seem like a luxury but unless you are Google or Amazon the business needs to be made aware that they are necessary and critical to the smooth functioning of their IT infrastructure, so you should be making these changes during the downtime window where everyone is aware that things might break.

Re:The tag says it all (2, Informative)

karnal (22275) | more than 4 years ago | (#30548264)

You bring up a good point regarding changing the running config vs the saved config.

What I'll do if I'm changing a remote system - POTS or no - is set up a reboot of the device in 15 minutes. After verifying the clock. Then, if something in the config causes an unforseen issue, you just need to wait a little for the switch/router to come back online with it's original config.

Obviously, this can extend the outage window - however, always plan for worst case...

Re:The tag says it all (2)

afidel (530433) | more than 4 years ago | (#30548626)

My favorite ultimate backup for rebooting a device is a DTMF controlled PDU, call into the OOB number and hit a magic number sequence and the device reboots =)

Re:The tag says it all (1)

LordAzuzu (1701760) | more than 4 years ago | (#30548658)

Regarding running config and saved config, some time ago I did an iptables script that would test a new rule chain for a specified amount of time, then reverting back to the previous one. It has saved me a lot of time many times, and actually a couple of times I locked myself out of the machine (that was a remote one, obviously).

Re:The tag says it all (0)

Anonymous Coward | more than 4 years ago | (#30548560)

And yet again, stupidity reigns slashdot! I fail to see how you get modded +5 when the author clearly stated he's a network guy. Long story short, he doesn't care about DBs, your post is pointless! On a cisco router, simply avoid doing a

copy running-config startup-config

unless you have a backup of running-config before said changes.

DB snapshots have nothing to do with it!

Re:The tag says it all (2, Informative)

Anonymous Coward | more than 4 years ago | (#30547970)

Not Pushing Juniper gear, but their Commit functions in JUNOS, and commands like "rollback" are serious things to consider in these scenarios. JUNOS also does things like refusing to perform a commit if you've done something obviously stupid (it does basic checking of your config when you commit).

Label me a shill. Whatever. JUNOS is a lot better from an operator POV.

Re:The tag says it all (2, Informative)

mysidia (191772) | more than 4 years ago | (#30548160)

My personal favorite thing about JunOS is "commit confirmed 10"

This can be a lifesaver, if you fat fingered something, and you break even your ability to access to the device, your transaction should roll back in 10 minutes.

If nothing goes wrong, you have 9 minutes to do some simple sanity checks, make sure your LAN is still working, and then get back to your CLI session and confirm the change.

Re:The tag says it all (2, Informative)

POTSandPANS (781918) | more than 4 years ago | (#30548528)

On a cisco, you can just do "reload in 10" and "reload cancel". If you don't know about those commands, you really shouldn't be working on a production network unsupervised.

As for the original question: Either use similar low end equipment, or use your spares. (please say you keep spare parts around)

Re:The tag says it all (2, Interesting)

mysidia (191772) | more than 4 years ago | (#30548828)

"reload in 10" on a core router or switch (eg a massive switch that also has routing duties) is insane, and will probably impact the entire network, for 20-30 minutes, if you accidentally lock yourself out (but don't otherwise impact anything) and fail to cancel that reload.

In addition, reload is risky, and the equipment may fail to come back up correctly.

Sorry, it's not anywhere close to comparable to the configuration management features in JunOS.

"Reload in X" is a bad answer, and should never be done, except on equipment that doesn't matter that much, or at a time when an hour of downtime is completely expected and acceptable.

Re:The tag says it all (0)

Anonymous Coward | more than 4 years ago | (#30548196)

Rollback and commiting can be found in Cisco IOS XE, Cisco IOS NX-OS and Cisco IOS XR. So any high end platform that needs these functions...

Re:The tag says it all (1)

GaryOlson (737642) | more than 4 years ago | (#30548366)

This is not 1985 anymore; rollback should be included by default in any networking equipment which deems to indicate it should be used for line of business networks. Providing rollback in only "high end platforms" is a scam for PHBs and less than competent network managers to waste money and feel good about themselves.

Re:The tag says it all (0)

Anonymous Coward | more than 4 years ago | (#30548674)

I agree, I am currently moving away from Cisco to JUNOS and I am loving it.

No, there's nothing wrong with that (1)

Rix (54095) | more than 4 years ago | (#30548240)

As long as the downtime that will result is acceptable.

Re:The tag says it all (3, Interesting)

eggoeater (704775) | more than 4 years ago | (#30548554)

I'm a call-center telephony engineer. Kinda the same thing as network engineer in that you're routing calls instead of packets.
Back around '01, I was working for First Union (which later became Wachovia). They had this massive corporate push for anyone and everyone in IT to roll out a standardized Software Configuration Management [wikipedia.org] , and of course we were included. The big problem was the lab. The corporate standard was to test changes in a lab environment and then move to production (duh).
For a telephony environment, we had a pretty good lab that could duplicate most of our production scenarios, but not all. Another problem was there were a LOT of people with their fingers in the lab since so many groups were involved: eg. The IVR team is in there because you have to have IVRs in the system. Same with call routing, call recording, desktop software, Q&A, etc.etc.
So the lab was in a constant state of flux with multiple products, multiple teams, and different software cycles and endless testing always occurring. We made it work by testing the stuff we weren't sure about in the lab, only doing changes in prod after hours, and having really good testing and back-out plans.
So when the corporate overlords started telling use we couldn't make any changes to production without running everything through the lab first, we basically laughed and told them we'd need around 500 million for the lab and dedicated resources to run it. I ended up telling them that to duplicate the production environment, we'd need another bank as our "test bank", and we could test changes on the test bank and then put them in the production bank.

As with so many things in that IT department, it went from being a priority to fading away when something else became a priority.

Re:The tag says it all (1)

Stripe7 (571267) | more than 4 years ago | (#30548830)

Pen and paper analysis may not find out all the issues. We had a weird one that flummoxed a bunch of network engineers. It was an IOS upgrade to the built in fiber bridge on a blade server. The old IOS worked fine, the new one worked until you tried to jumpstart a blade. Jumpstarts worked fine with the old IOS but not on the new one. As we rarely jumpstarted the blades, this issue was not caught until after the bridges on all the blade servers were upgraded.

Re:The tag says it all (0)

Anonymous Coward | more than 4 years ago | (#30549088)

Just make sure it works on your boss's system.

Mystery soved

Could be worse (4, Insightful)

7213 (122294) | more than 4 years ago | (#30547858)

The best bet is to be ready to blame the vendor when things go south ;-)

Seriously, I'm right there with you. If management does not want to provide for a test lab & reasonable time to test. Then it's clear they've made a 'business decision' that the network is not of sufficient value / risk is not great enough for such investments.

This may change quickly once something goes south (assuming they understand why it did) but you're gonna be talking to a brick wall until then.

It could be worse, you could have management that are afraid of there own shadows & who freak out at the idea of replacing redundant components after a HW failure. (Ever had to get VP approval to replace a failed GBIC? Oh, I have & yes, I hate my life).

Re:Could be worse (2, Interesting)

mysidia (191772) | more than 4 years ago | (#30548186)

See how much approval you have to get when the network is down because of a failed GBIC.

Redundancies against component failure are very good for the enterprise, but also make it harder for engineers to do their job, since "nobody notices that something has gone wrong".

Perhaps the real redundancies should be reserved for the absolute most business-critical things.

Make sure less important things are non-redundanct and arranged in a way, so that if any link or GBIC does fail, something noticeable to management will stop working, and cannot be restored without fixing the broken thing.

Re:Could be worse (2, Insightful)

hazem (472289) | more than 4 years ago | (#30548506)

That reminds me of an article by Nelson Repenning, "Nobody ever gets credit for fixing problems that never happened". It's quite an interesting read... The guy who "saves the day" during an emergency always seems to get credit and reward, but what about the guy who keeps the emergency from ever happening?

Re:Could be worse (1)

Sulphur (1548251) | more than 4 years ago | (#30548910)

One is Willie Mays, and the other is Joe DiMaggio.

Mays would make impossible catches, but DiMaggio was in the right place and the catch looked easy.

Re:Could be worse (1)

The Wild Norseman (1404891) | more than 4 years ago | (#30549056)

One is Willie Mays, and the other is Joe DiMaggio. Mays would make impossible catches, but DiMaggio was in the right place and the catch looked easy.

Can someone please explain this sports analogy with a car analogy so I can understand it?

Re:Could be worse (1)

mysidia (191772) | more than 4 years ago | (#30548912)

I think it's taken for granted as an expected part of the job, that the minimum things engineers/architects are supposed to do is prevent emergencies from happening.

If a bad enough emergency does happen, they might get fired for 'not doing their job', but they'll rarely ever get commended when their design works and protects the enterprise against certain doom.

Except by other engineers... I think (to some extent), that's just life.

How's a non-technical person supposed to tell the difference between the network being stable because it was well designed, and the network being stable, because the thing that can bring it all down just hasn't ever happened to have had any issues yet?

You'd be surprised how long a network with crucial issues can appear on the surface to be just fine, only to one day have a catastrophe due to the poor design, years later, when least expected....

Only network engineers are really qualified to really give this type of credit.. whereas any bimbo off the street can see when someone fixed an emergency [even if their own mistake caused it -- from many people, you will not get an admission of guilt, by avoiding admitting it, they can make it appear they are cleaning up after someone or something else] :)

Virtualization? (4, Interesting)

bsDaemon (87307) | more than 4 years ago | (#30547860)

It's perhaps not the best solution, as a lot of problems I've faced since I started getting more into networking stuff than software configuration and web server administration have been related to bad cables rather than bad IOS settings, but virtualization can help you create test situations on the cheep. Specifically, GNS3 allows you to create test networks in a virtual environment, then import software images for your Cisco routers, switches, PIX firewalls, Juniper hardware, etc, all run on hypervisor technology.

You can also use QEMU to create virtual network nodes. If you have enough RAM, then this can help at least get the logical issues worked out and the software configurations square. Then you just need to do the real work :) I'm still pretty new to networking myself, and I use it to make little test labs for myself when I need to do more than I can with the two 3600 and the 2600-series routers I got to take home for experimenting with. I actually copied the IOS images off of them via TFTP and then can replicate them as many times as I need to, but I can claim I have whatever interfaces I need, plus it will (thankfully) simulate the ATM switch for me as well.

Re:Virtualization? (2, Informative)

loki_ninboy (992401) | more than 4 years ago | (#30547914)

I'm using the GNS3 software with some IOS stuff to help prepare for the the CCNA exam. Sure beats paying the money for the extra hardware laying around the house just for learning and testing purposes.

Re:Virtualization? (1)

afidel (530433) | more than 4 years ago | (#30548256)

Almost as importantly with a simulator you don't have to POWER all that equipment, my CCNP lab almost maxed out a 20A circuit.

Re:Virtualization? (4, Informative)

value_added (719364) | more than 4 years ago | (#30547964)

Specifically, GNS3 allows you to create test networks in a virtual environment, then import software images for your Cisco routers, switches, PIX firewalls, Juniper hardware, etc, all run on hypervisor technology.

For anyone unfamiliar with GNS3, a link to the website [gns3.net] . There are versions available for Windows, Linux, and OS X. FreeBSD already has it in ports.

As a side note, I'd add that maintaining a home lab (to the extent practicable and useful) is one way to side-step limitations of what your employer provides. Consider it a combination of "Ongoing Professional Education" and "Proactive Job Security Measures" (i.e., "I better test this shit to save my ass tomorrow").

Re:Virtualization? (2, Informative)

Bios_Hakr (68586) | more than 4 years ago | (#30548688)

If you work a pure Cisco environment, talk to your Cisco guy about getting Packet Tracer. Emulates a few routers and a lot of switches. It works really well. Plus, 5.1 adds virtual networking. You can design several networks on several laptops and then join those networks over a virtual internet.

You could simulate the net with Packet Tracer (1)

sconeu (64226) | more than 4 years ago | (#30547880)

Granted, it's not really an ideal solution, but it may wind up being the only way to avoid using production equipment.

Document and test at night (5, Informative)

jdigriz (676802) | more than 4 years ago | (#30547884)

Step 1) Make a formal request for the test lab. Make it as detailed as possible. Explain the impact to business if various components fail. Make a plain-language executive summary calling out risks. step 2) Once the request is denied, make sure you have a paper trail of the rejection step 3) If possible test network changes on the production equipment at 2am so that impact on users will be less step 4) Once the inevitable failure occurs, haul out the paper trail and get the bean counter fired. Repeat until test lab is approved. Note, step 4 may get you fired instead. Business decisions are somewhat nondeterministic.

Re:Document and test at night (1, Insightful)

Renraku (518261) | more than 4 years ago | (#30547962)

If you get fired for failing to do a job for which you were not equipped (and they know you aren't equipped for it), you might be able to sue because they created a hostile work environment. Hostile work environment lawsuits aren't just for sexual harassment, folks.

Re:Document and test at night (1)

nametaken (610866) | more than 4 years ago | (#30547986)

There's a potential hitch or two in your plan.

If it goes smoothly anyway, you might look like a whiner that didn't need the expensive toys to keep on the shelf. They feel vindicated. If it goes poorly they'll assume you didn't really try because you wanted to prove yourself right.

Re:Document and test at night (4, Funny)

SethJohnson (112166) | more than 4 years ago | (#30548056)

If it goes smoothly anyway, you might look like a whiner that didn't need the expensive toys to keep on the shelf.

Hence, you have the plug to the main router beneath your own desk. When the sailing looks smooth, you kick out the cord. While everyone freaks out, you open up a terminal window and begin typing nonsensical commands. Say, "Ahaaah! As you re-plug in the router.

Job security.

Seth

Re:Document and test at night (1)

FrankDerKte (1472221) | more than 4 years ago | (#30548280)

Sounds like Bastard Operator From Hell to me. But it could be the only defense against Incredibly Incompetent Manager From Hell.

Re:Document and test at night (1)

maxume (22995) | more than 4 years ago | (#30548392)

I use fire.

Re:Document and test at night (1)

mysidia (191772) | more than 4 years ago | (#30548934)

Until one afternoon when the janitor unplugs something from a power strip under your desk, to get an outlet for their vacuum, and the main router happens to go down....

Re:Document and test at night (1)

GaryOlson (737642) | more than 4 years ago | (#30548390)

Please tell us all how you convinced an electrician to install dual L6-30 208V plugs beneath your desk. And how kicking said twist lock plugs -- both of them -- will cause the plug to come loose.

Re:Document and test at night (1)

mysidia (191772) | more than 4 years ago | (#30549000)

Not power, the Ethernet feed :)

Router's LAN Interface -> Patch Panel -> Ethernet Port Under your desk -> Straight cable plugged into other port under your desk -> Patch Panel -> Firewall/Security Appliance Outside LAN Interface.

So by kicking out that cable, you separate LAN from router...

Or, by kicking out power to the 5-port hub under your desk you temporarily used to sniff traffic between the firewall and your site's edge router.

Re:Document and test at night (1)

mybecq (131456) | more than 4 years ago | (#30548586)

Say, "Ahaaah! As you re-plug in the router."

With your feet? You ARE talented!

Re:Document and test at night (0)

Anonymous Coward | more than 4 years ago | (#30548810)

You're the man, man!

Re:Document and test at night (1)

jdigriz (676802) | more than 4 years ago | (#30548224)

Are we in the same business? No one ever notices IT when things go well.

Re:Document and test at night (3, Informative)

Keruo (771880) | more than 4 years ago | (#30548000)

step 3) If possible test network changes on the production equipment at 2am so that impact on users will be less

Been there, done that. Sadly the only way to see how your setup works is to try it in production.
Sure it helps if you can test it beforehand, but sometimes your lab might not reflect what happens in real network when you roll something out.
Just make sure you can clock those am hours as overtime/nighttime work.
And remember to backup the running config twice so you can restore the production network if something goes fubar.

Re:Document and test at night (1)

dkf (304284) | more than 4 years ago | (#30548216)

Been there, done that. Sadly the only way to see how your setup works is to try it in production.

The other thing to mention is to be honest with the other technical staff about if you've actually made a change, even if "trivial". This is because sometimes when you modify something, you can end up dumping them in the shitter accidentally, e.g., by putting a critical service on the wrong side of an internal firewall so that no packets get routed to it at all. In fact, I saw that once and networking stonewalled for a week before admitting that indeed they had made a small modification "that shouldn't have affected anything" and which cost quote a lot of lost work. By being honest and humble, they'll cut you more slack later if/when the boot's on the other foot.

OTOH, if management are insisting that all communications get routed through them, you're screwed. (NB: that's not the same as managers getting Cc'ed.)

Re:Document and test at night (1)

mysidia (191772) | more than 4 years ago | (#30549092)

In some environments, that is frustrated by other (lazy) technical staff, who immediately start automatically blaming _every_ problem they find for the next few weeks, on that one change, without even doing any helpful troubleshooting, or finding any reason at all to suggest it might be the case.

The problem is unrelated and would happen anyways, but because they heard of a recent change, there is a cognitive bias towards immediately suspecting the new change, just because it's a change they know about.

"I didn't change anything, so if I just started getting a few problem reports it must be your change"

This is the sort of thing that may annoy some technical workers, and possibly cause them to not report certain minor changes as widely as they could. Desktop support should not care much, for example, if the network team changes security measures on routers protecting administration access, or performs regular password changes, there are lots of minor changes that don't merit announcing.

It's trouble enough that technical staff (esp. Desktop admin types) often seem to automatically think perfectly innocent network devices, routers, firewalls, switches, need to rebooted, before exhausting obvious causes like software/Windows problems.

"Someone was getting '504 page not found' errors trying to reach some web site.. so i'm power cycling the router labelled "Catalyst 6509-E core switch" in the wire closet, to see if it helps.. (You're doing what??)"

Re:Document and test at night (1)

dbIII (701233) | more than 4 years ago | (#30548370)

Sure it helps if you can test it beforehand, but sometimes your lab might not reflect what happens in real network when you roll something out.

That means your experimental model is not good and needs to be refined.
You see - all those guys that did a six month course and call themselves "engineers" could have had some benefit of a real engineering education or the experience of working with real engineers.
Meanwhile I have idiots learning about routing or DHCP on production systems because they can't be bothered to go into another room and turn on power switches to run things on a development network. It's very nice when developers work at improving their skillset, but the attitude of trying things to see if it works instead of RTFM is not compatible with anyone else using things at the same time. We've been poisoned by the MSDOS mindset of the single user, poor documentation and the "just reboot and see if it works now" attitude.

Re:Document and test at night (1)

timmarhy (659436) | more than 4 years ago | (#30548510)

yep, and the other fail here is that a lot of production environments are 24/7. there is NOT a slow point, ever.

Re:Document and test at night (3, Interesting)

Anonymous Coward | more than 4 years ago | (#30548472)

Note, step 4 may get you fired instead. Business decisions are somewhat nondeterministic.

And that's what happened to me.

I was forced into making changes in the production environment, and caused an outage that affected 2 people. Once I realized what happened, I quickly fixed it; however due to internal politics I was terminated the next day.

Initially I was in shock. 10 years, 2 months employed in a single company. Gone. I have a stay-at-home wife and 3 kids; which made things look even bleaker.

In hindsight, it may be one of the better things to happen to me. I had spoken with a recruiter a few days before hand to start looking for work. When this happened, I was able to dedicate myself full time for job-searching. I was also off for hunting season, and able to do many things with my family that I normally wouldn't be able to do. The environment where I was was just awful. Several former co-workers have left since my special day. The CTO is a psychopath. He has 2 sayings he likes to use - the first is 'to do the job right the 1st time'. The second is a Mario Andretti quote of 'If you don't feel like you are out of control, then you aren't going fast enough'. These sayings are mutually exclusive, but logic doesn't apply.

I start a new position on Jan 5th (but it is only a 6 month contract position). It is a bit more money, and I have about 1/2 the commute. It is also a much better work environment.

Things I learned:

- Stockholm syndrome is apparently real. I didn't want to leave because 'it's not that bad'. It was bad. Worse.
- I hate job hunting.
- Employment law in Ontario, Canada is not what I thought it was. Pretty much everything I though I knew was wrong.
- The economy here in Ontario is poor, but improving (but vastly better than the US).
- Legal advise in Ontario is tax deductible (at least in reference to employment issues).
- A certain CTO is a complete and total prick.

(ha - my captcha word is 'inaction')

Re:Document and test at night (0)

Anonymous Coward | more than 4 years ago | (#30548698)

Man you really got screwed. I am sorry to hear that and I hope you find a great job soon.

Re:Document and test at night (2, Interesting)

SharpFang (651121) | more than 4 years ago | (#30548500)

3) If possible test network changes on the production equipment at 2am so that impact on users will be less step

That's dangerous. You leave it apparently running and crawl back to sleep at 4:30AM, to get an angry call at 7:05AM when the first users to log in report something essential is fucked up.

Prepare and test at 2AM, then roll back to original. Then re-apply around lunch break and wait with your fingers on roll-back for the first reports of failure.

Re:Document and test at night (1)

BiggerIsBetter (682164) | more than 4 years ago | (#30548514)

3) If possible test network changes on the production equipment at 2am so that impact on users will be less step

You're a network guy, right? How well do you know the applications that use your network? How sure are you that the application behind, or in front of the change you're making don't need a restart after losing connectivity? Maybe your late night tests are causing all sorts of problems and expense when the apps guys come in to find the system inexplicably down, having visible outages, and have to start raising support requests against vendors to find a solution to their non-reproducible high severity defect in production? Don't do that.

Re:Document and test at night (1)

jdigriz (676802) | more than 4 years ago | (#30548876)

Actually no, I'm an applications and server guy. I had to learn networks because our network guys were incompetent. However, if your applications get to such a bad state that they need to be restarted due to a loss of network connectivity, they're badly written. And if the applications guys don't know about the sensitivity of their apps to network outages, and aren't actively monitoring their servers for interrupted services, then they don't know their applications very well either. In any case, if Step 3 causes the problems you mentioned, then we go on to Step 4 and the problem is solved, one way or another.

Re:Document and test at night (0)

Anonymous Coward | more than 4 years ago | (#30549048)

Step 3a: pull some tiles from the machine room floor, disconnect the lights and call the bean counter in for an "urgent meeting". Be sure to have plenty of quicklime handy...

My last resort (5, Funny)

tchdab1 (164848) | more than 4 years ago | (#30547886)

I call my buddies at RIM and test my mods on their system.

Re:My last resort (1)

sparkin_nz (1093581) | more than 4 years ago | (#30548664)

Is that some kind of RIM job?

Boson and VMware (1)

Zlurg (591611) | more than 4 years ago | (#30547890)

Seriously, try and find as much virtual equipment as you can and replicate it as closely as possible to your production lab. If you run one of the myriad sniffers on a VM, you might even come up with a clever way to send production traffic to your virtual lab. There is no other way to do it. You are screwed, so if you're serious, you can either buy the lab yourself or make one out of tin cans, coconuts and wet rope.

Re: Testing Network Changes When No Test Labs Exi (2, Interesting)

droz037 (1323053) | more than 4 years ago | (#30547930)

I would suggest asking your vendors for demo or evaluation equipment. Cisco, Juniper and 3Com have pools of demo equipment as do the resellers like PC Connection and CDW.

I've done deployments of new switching infrastructure based on work I've done with loaners from my vendors. It can be tough because the typical evaluation period is 30 days. Although you can get 45 and even 60 days.

If you have a good relationship with your sales rep. It would be easy to push them to get the necessary items to do basic testing and get the concepts down of how you need to deploy. Then get the config files so that when you do buy what you need you're 85% of the way there.

Let it burn (0)

Anonymous Coward | more than 4 years ago | (#30547948)

One problem with the situation you are in, is that you've got a work-around that has sufficed so far. So, you might WANT a test lab, but clearly you don't NEED one... because hey, if you needed it you couldn't have got all this production stuff working, right? The only way this changes is when you've got multiple teams dealing with a production outage that takes a long time and costs a lot of money because you have to do some trial-and-error fixes to isolate the problem. Only THEN will you get your test lab, after an appropriate amount of paperwork and delay. The trick is doing this without the outage being perceived as your fault.
 

Packet Life (3, Informative)

z4ns4stu (1607909) | more than 4 years ago | (#30547960)

Stretch, over at Packet Life [packetlife.net] has a great lab [packetlife.net] set up that anyone who needs to test Cisco configurations on can sign up for and use.

Tools (5, Informative)

Tancred (3904) | more than 4 years ago | (#30547976)

Here are a few tools:

GNS3 - http://www.gns3.net/ [gns3.net] - free network simulator, based on Dynamips Cisco emulator
Opnet - http://www.opnet.com/ [opnet.com] - detailed planning of networks, from scratch
Traffic Explorer - http://packetdesign.com/ [packetdesign.com] - plan changes to an existing network

Re:Tools (0)

Anonymous Coward | more than 4 years ago | (#30548322)

If you need to virtualize LOTS of nodes IMUNES (http://imunes.tel.fer.hr/virtnet/) might be interesting. It uses network stack virtualization now integrated in FreeBSD 8 kernel and can run simulations with thousands of nodes with cca 100kb overhead per node for virtualization.

Good side: IPSEC, Link BER and bandwith, lightning fast

Bad side: no CISCO emulation, you get only basic switch/hub elements,

Old screenshot: http://old.tel.fer.hr/imunes/GUI-normal.gif

Re:Tools (1)

vvaduva (859950) | more than 4 years ago | (#30548414)

Those are great recommendations, thanks!

lots of little things (2, Informative)

wintermute000 (928348) | more than 4 years ago | (#30547988)

Older Cisco equipment can function just as well as newer for 95% of lab scenarios. You are very unlikely to be needing to use all the newer features.

Anything that can run IOS 12.3 and is newer than a decade old can do a lot more than you think. We do all our BGP testing on a stack of 2600s and 3600s and never an issue even though in production its 2800s, 3800s etc.
Granted there are features that you do need the newer kit esp when syntax changes (e.g. IP SLA commands, newer netflow commands, class map based QoS to name three off the top of my head) but none of the core routing and switching features/commands has changed much since the introduction of CEF - they all do ACLs, route maps, OSPF, BGP, EIGRP, vlans, spanning tree, rapid spanning tree, IPSEC vpns. I'm speaking from an enterprise POV not a service provider but I'd imagine if you are in a telco environment you wouldn't be lacking gear.

For many minor test scenarios, you can pick a test branch office and use the good old 'reload in XYZ' command to ensure that no matter how badly you stuff it up, everything will bounce and come back (just remember NOT TO COPY RUN START lol).

Then there's the sleight of hand methods:
- always ordering more for projects than you really need. Par for the course really esp as most project managers haven't a clue when it comes to the nuts and bolts of a big cisco order.
- pushing for EOL replacements as early as possible, intentionally conflate end of sale with end of life.
- getting stuff in for projects as early as possible, then you have a month or two to use it as test gear.
- remember that your lab need not mirror reality, scale down as much as possible. e.g. to simulate a pair of 4506 multilayer switch running in VRRP, use a pair of 3560s. Use your CCO login and flash away to your hearts content (I know its breaching licencing but for test scenarios, meh).

Rancid? (1)

fuzzyfuzzyfungus (1223518) | more than 4 years ago | (#30548006)

It doesn't save you from doing stupid things; but putting your device configurations under revision control, using something like Rancid [shrubbery.net] can make rolling things back easier, as well as generally encouraging sanity around device configuration.

Re:Rancid? (1)

SlamMan (221834) | more than 4 years ago | (#30548872)

Rancid's good. Also look at CatTools from http://www.kiwisyslog.com/kiwi-cattools-overview/ [kiwisyslog.com] for a similar windows tool. Free for small networks, ~$550 for networks over 20 devices.

Go virtual! (3, Informative)

leegaard (939485) | more than 4 years ago | (#30548064)

If you are unable to recycle old equipment into your testlab you should go virtual.

For Cisco routers, GSN3/Dynamips (www.gns3.net) is your friend. Any recent PC or laptop will allow you to build a large and complex topology that will satisfy most experiments and even support you when doing certification preparation. It will only work for routers so switch-based platforms are out (like the 3570,6500 and 7600). The good news is that the features are more or less the same and they more or less behave the same way. If "more or less" is not close enough you need a replica of your production network or at least a few devices of each to test what can be labelled as critical.

For Juniper routers, google juniper Olive. It will run a juniper router the same way dynamips runs a Cisco router.

In both cases a proactive partnership deal with the vendor will be a good idea. Both Cisco and Juniper (and I am sure all other major network vendors) have programs where they will more or less advise, test and prepare the configurations for you. If you run a critical network this is money well spent.

In the end it comes down to the level of risk your management is willing to take. Ask them if they will allow the network to be less up since you are unable to properly test your changes before implementation.

Out of hours changes, and change managment (2, Informative)

anti-NAT (709310) | more than 4 years ago | (#30548066)

For any sort of medium to large network, you can't fully simulate it. That means you're always going to be making "untested" environment. So, you make very few changes rather than lots, you make sure after each change they've had the desired effect, and you have backout plans.

if you ask for it, you may not get it... (0)

Anonymous Coward | more than 4 years ago | (#30548076)

but if you write a proposal and show the benefits of having the right equipment and the operational costs of not having the right equipment, you might be able to get a spirent testcenter. Do a demo with some linux/*bsd boxes running iperf, but remind them of the features and abilities you will get with quality network testing tools.

Borrow a lab! (3, Interesting)

jimpop (27817) | more than 4 years ago | (#30548092)

Cisco have many (large) labs located around the world. Sign up for some time in one of them.

Plan, inform and be prepared! (1, Insightful)

Anonymous Coward | more than 4 years ago | (#30548098)

Been there, done that (A LOT!!)
But it has failed quite a few times too..

If no money available for test labs, make good plans... Tell the dudes that wanted the changes (or if you are the dude that wants the changes inform the correct people that you will be doing stuff) Agree on a service window. Have backup plans.. Have all configurations saved.. Let all users know that after 10pm on that saturday network will be down for 10 mins etc etc..

Have tons of contengency plans, and let the 'responsible' people known what you are about to do.. Plan everything 'wide'... So even a 5 mins cable plugover, reserve a service window outside of office hours for 2 hours..

The power of logic compels me! (1, Interesting)

Anonymous Coward | more than 4 years ago | (#30548104)

You do not mention that this has ever made shit hit a fan. I conclude that so far this has not occured.

Consequently, you have proved that you are able to work without expensive test equipment by a combination and motivation and elbow grease. Congratulations!

Now, what is the logic for someone with a finite pool of money to provide equipment for someone who obviously does not NEED it? Yes, None At All!

You can therefore:
1) Wait until shit hits a fan and say "well, that's what happens when we don't have test equipment". You will then get test equipment OR get fired.
2) Make the shit hit the fan yourself. This is quite difficult to do inconspicuously, so you'll probably get fired and a shit reference.
3) Look around for jobs as well paid as yours but with test equipment.
4) Someone mentioned asking vendors for test equipment - maybe that might work? Note: sales reps have a quota of favours they can call in, so it helps if you have some steady business with them.

Re:The power of logic compels me! (0)

Anonymous Coward | more than 4 years ago | (#30548438)

5) STFU, and make it work the first time.

Posting humorously, for obvious reasons.

Simulation (0)

Anonymous Coward | more than 4 years ago | (#30548120)

That is what simulation/network planning software is for. For example OPNET: http://www.opnet.com

In case it explodes (1)

Ximok (650049) | more than 4 years ago | (#30548140)

reload in 5

I'm dead serious. If you are on production stuff and you screw it up remotely, you can at least tell it to reload and pull it's old config. You have some downtime, but it's better than the downtime you'd experience if you had to drive out there.

Paper Trail (3, Interesting)

tengu1sd (797240) | more than 4 years ago | (#30548176)

>>>refuse to provide funds for expensive lab equipment, test circuits and for reasonable time to get testing done before moving equipment or configs into production.

Make sure that every change request implementation documents that this change is being placed intro the production environment for testing. Document impact ranging from total network failure to moderate inconvenience and include roll out time tables. The roll out needs include travel times such drive to site B or fly cross country.

Of course the downside of this is that management may go out and hire someone who knows, or at least pretends to know, how to drop changes into place without whining about ignorance and making customers uncomfortable.

Some pointers (1)

pehrs (690959) | more than 4 years ago | (#30548178)

It depends a lot on your environment and the complexity you are dealing with. Test labs are wonderful things, but typically you end up in a situation where your network is so limited that a lab won't help much, or your network simply too complex to create a sane lab environment without dedicated staff and a huge budget.

Building a full scale lab is a large undertaking. It takes time and effort. You will need taps (for routing information), traffic generators, topology management and more. In my experience it's usually better to have a smaller testbed that is used to test large changes before deployment and design your network so it's resilient to configuration mistakes.

Getting funding for a limited testbed is also much easier than a full lab, and you can do a lot of testing by simply stuffing a few routers in a rack and connecting it to the network management system. Virtualization is something a lot of people will mention. It's useful, but it's hard to build anything resembling a modern network on top of it. You want hardware that resembles what you use in the network. Sometimes you can scavenge such hardware during upgrades, which can provide you with a basic testbed to build from.

Don't waste the next crisis/flap (0)

Anonymous Coward | more than 4 years ago | (#30548226)

When it happens, point out (on paper!) to yr mgmt chain how it cd have been prevented with a decent test configuration in place.

Don't forget SOX (2, Informative)

jackb_guppy (204733) | more than 4 years ago | (#30548348)

1) You should not be making any direct changes to the network with out correct design, test and sign off.

2) You should already have a redundant network structure, so "half" can be loss without any loss to network operations. This way the change can be tested in parallel.

3) You should always report to SOX officer when a request outside correct operations and management is made. It makes it their responsibility to solve the legal issues, for not following their written standards, before you began.

its the time to (1)

GooseYArd (96708) | more than 4 years ago | (#30548356)

polish your resume.

Download vyatta (1)

Sxooter (29722) | more than 4 years ago | (#30548364)

Download an iso from Vyatta and build a test network with old PCs and spare NICs for testing. Sure, it's not the exact same as Cisco, but if they're too cheap to buy the real thing for a test lab then you'll at least be somewhat close.

Then, once you realize what you're not getting for your money with Cisco, you can buyt $1000 1U servers and build your own routers (or buy them prebuilt from Vyatta for about $2000) to replace the ciscos and make a profit selling the used Ciscos on ebay.

I do NOT work for nor am I affiliated with Vyatta. But their gear is pretty impressive, and open source.

UNH-IOL (1)

slugmass (1215630) | more than 4 years ago | (#30548386)

The UNH-IOL is a neutral, third-party laboratory dedicated to testing data networking technologies through industry collaboration.

http://www.iol.unh.edu/ [unh.edu]

Get it in writing, let it fail. (1)

timmarhy (659436) | more than 4 years ago | (#30548388)

Make your objections in writing, email it to the manager demanding the change you believe to place production at risk with the risks clearly outlined in bullet points. if he then insists you proceed, make him send you the request in writing/email and print out a duplicate, keep it in a safe place and then make his change. This way he owns the failure, not you. paper trails exist for a reason, to cover arses, and arse covering is often a worthwhile exercise.

You answered your own question. (1)

mnslinky (1105103) | more than 4 years ago | (#30548654)

As you already said, we secretly test on production in such cases.

Only a matter of time (1)

w00ten (1030874) | more than 4 years ago | (#30548736)

It's only a matter of time until a change that wasn't properly tested completely screws everything up and some exec is lookin at you for answers. I've learned that the best interpersonal skill to have is deflection. Nice guys finish last, especially in a corporate environment, so try to get test equipment and when they say no, like all companies do, SAVE IT so you can blame someone else! This is what you can send to the CTO when he asks why you didn't properly test the changes that caused the company to lose millions of dollars in operating costs cause the network was down for 6 hours. "well, I warned people in this email trail and proposal, but they shot me down, and I was right". If by some incredible miracle this never has to happen, then count your lucky stars and when they ask why nothing has gone wrong, toot your own horn and say that it's because you are so damn good. No matter what, you show value, you secure your position. As for basic testing, any of the programs mentioned here will work, Packet Tracer is limited in the models it supports so you might want to look at something else first.

try clownix (1)

peril (11405) | more than 4 years ago | (#30548776)

http://www.clownix.net

I did a write-up on this product in the beginning of this month - can run quagga routers in the UML image of your choice - wrote / ran a 12 router lab that ran on a p4 with 512MB / RAM. (http://www.vlcg.net/content/cloonix-clownix-rocks)

If this product was used - you would only be able to functionally test the protocols in a particular topology - wouldn't be cisco, and it wouldn't be the same as production (different protocols, different topologies).

I discovered this trying to figure out a way to run quagga in a gns3-like setup. GNS3 is great for testing a specific cisco thing that you need to learn about - but it didn't do well for me beyond 3 routers - (too much hand-holding getting the environment tweaked).

My ultimate vision for quagga would be to run it on the hypervisor and let it scale (in numbers of routing instances) wrt to the number of hypervisors - it's a pipe dream for now, but I think that routing that can scale with hypervisors is going to be a big challenge for cisco (esp if they try to do it in silicon) -

--Adrian

Don't do it (1)

tedgyz (515156) | more than 4 years ago | (#30548938)

Management hates paying for double the equipment, but for any production environment, it should be the cost of doing business. It minimizes risk and provides hot spares faster than an HP (or whatever) tech shows up. You should get some duplicate hardware for staging.

If you can't do that, then refer to the earlier post [slashdot.org] - don't fsck up.

Hope you're good and/or have a good relationship. (0)

Anonymous Coward | more than 4 years ago | (#30549266)

I work in a small IT house that provides network support for quite a few customers that are not large enough to have their own IT people.

We're very Windows centric (yeah, I know, boo) and have no budget for any test equipment/training, yet am expected to be up to date on changes in Windows.To make matters worse, I'm not even supposed to have the time to test things out on our internal network and the pay is low enough that I can't afford to purchase equipment to test at home on my own time.

So, what works (kinda) for me has been to keep an eye out for equipment that has been abandoned by our sales team, (usually through extensive hardware problems that causes a customer to decide that it's not reliable enough for their network), and/or take equipment off of the sales shelf for testing. For the software/knowledge side, I will quite belligerently tell my boss to go away when I'm testing something that needs to get tested. This requires that you have a certain amount of clout and/or your boss is afraid of you quitting enough to let you get away with it.

On the customer/end user side, develop some sort of personal relationship with them, whether that be going out for drinks with them periodically, knowing what they do for fun and/or have them know what you do for fun (no, gaming doesn't cut it). Be up front with them when something does mess up (literally saying that you didn't realize that what you were doing might have that problem).

Never, ever blame someone else unless you're sure it's their fault, take the blame yourself-this'll save your ass when it really isn't your fault and someone tries to pin it on you.

---
Having said all of that, what you (and I) should really be doing is looking for a new job.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?