Beta

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

No Hassle RAID 5 Implementations?

Cliff posted more than 12 years ago | from the it's-got-to-be-easier-than-this dept.

Hardware 51

LambSpam asks: "I had a nightmare week (last week) with two of our servers running Intel's U3-1L RAID controller (RAID 5). Whenever there's a power outage in our building these controllers randomly mark one or more of the drives in the array offline (even with adequate UPS support), which means I have to manually mark them online and/or rebuild. Intel acknowledged the problem, but their solution involves updating the backplane's firmware, the controller firmware (destructive upgrade!), and even the firmware on our IBM drives in the array because they 'draw too much power' in certain conditions. I've only used one other RAID 5 implementation (MegaRAID), and it NEVER had these kinds of problems, whereas if you sneeze too hard around this U3-1L card it will go offline. Is this common with most hardware RAID implementations? What RAID 5 implementations works without hassle? What should I stay away from?"

cancel ×

51 comments

Sorry! There are no comments related to the filter you selected.

PERC? (2)

AnalogBoy (51094) | more than 12 years ago | (#3172405)

I've never had any problems with the PERC (PowerEdge Raid Controller) in the Dells i (used) to use for Sendmail servers. That kind of limits your choices, though..

Re:PERC? (3, Interesting)

krangomatik (535373) | more than 12 years ago | (#3172465)

I haven't personally had any big problems with the PERC boards, although friends and co-workers always seem to have had bad experiences with them. I've had really good luck with IBM ServeRAID boards. We have quite a few of these in production boxes and haven't had any problems with them(the IBM hard drives on the other hand...plenty of failures there). If your RAID problems are big enough that you're willing to put up lots of $$$ to get rid of them you could look at buying a SAN or NAS. That way, in theory, you could have the vendor install and maintain the disk for you. Generally they seem to do an okay job. I must mention however, that I have seen a vendor make an oops and drop power to an array while trying to fix a power supply problem. That took some time to get back online because the CE out on site wasn't familiar with that product and ended up having to get a senior CE to drive out and fix it. All and all it seems like the big boys(IBM, EMC, Sun, STK, etc) are pretty good about keeping uptimes in the 99.99%+ range(i guess that's what you give them the big bucks for).

Re:PERC? (3, Interesting)

AnalogBoy (51094) | more than 12 years ago | (#3172494)

Keep in mind there are a few different versions of the PERC, some better than others.

Just a note on EMC.. When i've had the joy of working with a Symmetrix, EMC has always done a wonderful job of never having any downtime. They would come out at any hour of the day or night to replace a redundant card or a spare disk that wasn't even being utilized. They always evaluate any changes before they are made. I'm sure its possible for them to make a mistake, but for mass storage they're the ones i would choose.

Re:PERC? (3, Interesting)

foobar104 (206452) | more than 12 years ago | (#3172498)

All and all it seems like the big boys(IBM, EMC, Sun, STK, etc)

Just FYI, Sun doesn't actually make their high-end storage product. I think they call it the StorEdge 9900 or something but it's actually a rebranded Hitachi Data Systems 9960.

Funny thing about HDS. When you buy one of their 9960 systems-- a minimum investment of about $250,000-- you get a guarantee. If you ever lose any data at all on that storage system due to hardware or firmware fault, HDS will give you 30% of your purchase price back.

According to one of the senior HDS VPs that I spoke to last month, they've never had to pay out that penalty clause.

Re:PERC? (2)

ivan256 (17499) | more than 12 years ago | (#3173371)

Most PERC boards are AMI MegaRAID cards rebranded.

Re:PERC? (1)

bozoman42 (564217) | more than 12 years ago | (#3175725)

No, the older ones are. The newer ones are rebranded AIC. And the model numbers don't help a lot.

Re:PERC? (1)

uberdood (154108) | more than 12 years ago | (#3187341)

Count yourself lucky. I have problems with the PERC 3 boards in PowerEdge 6400s. If they run out of juice (power outages that last longer than the UPS), they forget about the array. The only fix is to remove the cache SIMM, power up, power down, and reinsert the cache SIMM.

Re:PERC? (2)

SuiteSisterMary (123932) | more than 12 years ago | (#3191827)

Were these PERCs with or without on-card battery backup? I'll also point out that your UPS really should power your systems down when the UPS is at 25 percent charge, or (average shutdown time, including all services)*2, whichever is longer.

Xiotech (1)

torndorff (566594) | more than 12 years ago | (#3172449)

Xiotech, affiliated with Seagate, offers a Mag unite in fractional TB's, with highly customizable options. They're linked to servers with QLogic fiber-channel cards and are easy to setup; they even have knowledgable tech come on site for install.

ICP Vortex (0)

Anonymous Coward | more than 12 years ago | (#3172473)

I've had the same sort of problems with Adaptec AAA-133 RAID controllers, (which are in reality software based RAIDs with some HW support from the card,) and I've had the RAID management software for another major brand I do not recall the name of knock the RAID offline when it crashes, plus before long the hardware failed. The only hardware RAID controllers I've had success with, (and great success with at that,) are ICP Vortex controllers. My experiences with all of the above have with under Novell Netware, but at least from what is claimed on ICP Vortex's web site, they have great Linux support. And with the quality of their monitoring software, hardware, and even BIOS setup utility, I suspect they work perfectly under every operating system. They are also one of the few I've come across that only appear as one PCI device, so they can be used with normal motherboards with devices that do not support IRQ sharing.

Re:ICP Vortex (1)

bozoman42 (564217) | more than 12 years ago | (#3175728)

I've used a multichannel Vortex in setting up a NetApp-like box using Linux and LVM and it's an absolute dream. Would've been highly recommended, except they've been acquired by Intel recently and I don't know what direction they've taken (since I've moved on from that particular jobsite).

IBM HDs (1, Informative)

Anonymous Coward | more than 12 years ago | (#3172479)

I've also had performance problems with IBM drives in RAID 0/1, and especially RAID 5 setups. I contacted IBM tech support to see if any of the settings could be tweaked, but the response was the drives are not RAID optimized. I switched to Seagate drives, and subjectively I would say the performance quadrupled under heavy load.

Re:IBM HDs (1, Interesting)

Anonymous Coward | more than 12 years ago | (#3173353)

I had two new IBM 36-GB drives fail this week on a Dell 2450 with PERC3/Si and RAID 5. Not good. I replaced 'em with Seagate 15k rpm drives and all is better and the performance of the machine seems better, too.

FWIW, I've found the drivers for the PERC in FreeBSD to be far better than those in Linux.

Re:IBM HDs (1)

akharon (4824) | more than 12 years ago | (#3183722)

It's worth noting that the linux drivers were ripped off of FreeBSD. I say ripped off, because that's exactly what they were, taken with no credit given to the author, in violation of the BSD license.

Re:IBM HDs (0)

Anonymous Coward | more than 12 years ago | (#3197364)

bullshit.
the guy who wrote the BSD drivers also wrote the linux ones.
thats why they look alike.

Re:IBM HDs (0)

Anonymous Coward | more than 12 years ago | (#3201822)

No, you're thinking of the Promise ATA-RAID controller drivers. Those were ripped off from FreeBSD without attribution.

Eurologic (1)

jayrtfm (148260) | more than 12 years ago | (#3172553)

I've used 2 Voyager 3100's [eurologic.com] with the fibre module.
Due to excessive server room heat, we did lose a drive, but data was fine. While it has Windows software to monitor it when connected via scsi, they didn't have anything for unix, so configs had to be done via telnet on its serial port.

Tried Adaptec? (5, Informative)

Judg3 (88435) | more than 12 years ago | (#3172666)

Were I used to work (An all-windows shop) we used Adaptec [adaptec.com] RAID cards in all our "tower" based servers. Even the lower priced models (AAA-131U2) always performed without a hitch and we never had any problems with them at all. AMI's RAID controllers are real nice and all, but for the price it just wasn't worth it. The Adaptec solutions performed just as well and at a lower cost. You'd do good to check em out.

Now the 3200 RAID Controllers int he Compaq's, thats another diffrent story altogether.
We had roughly 2000 servers, operating 24/7 @ 67 degrees F. Two times a year we had a site shutdown. Every single time we had to bring everything back up we would have anywhere from 3-5 Compaq array controllers die. But never once did the low-buck Adaptecs crap out on us.

Re:Tried Adaptec? (3, Informative)

Sivar (316343) | more than 12 years ago | (#3180078)

The general consensus on StorageReview.com (a site that I would trust for anything storage related) is that Adaptec cards are crap, the performance under load is mediocre, they tend to die (despite being a solid-state device) and that often times the non-windows drivers aren't the best.
Don't take it from me, ask around there. If they worked for you, however, great. Whatever works.

UPS quality. (0)

King of the World (212739) | more than 12 years ago | (#3172686)

Dude, obviously you don't have adequate UPS support if the device does something when the power goes out. It is possible to get a better UPS that will flick over and back quick enough for all hardware not to notice.

Firmware (4, Informative)

Holophax (21693) | more than 12 years ago | (#3172978)

Just as a shot in the dark, I would suggest trying to upgrade the firmware on the drives first. At one of my old jobs, we used nothing but IBM drives, and we constantly had problems with the drives becoming marked as bad or off line, but simply pulling them and plugging them back in (hot swap) would bring them back. In our situation, we were using IBM Netfinity servers with IBM raid controllers. When we talked to IBM, they admitted there was a problem with the firmware on the drivers which would cause the drive to not spit out just one error whenever an event (even a simple read error) happened, but to spew them constantly, which made the raid controller mark the drive as bad. Seeing as it only takes a few minutes of downtime and is non-destructive, it might be worth a shot.

Two possibilities... (4, Interesting)

Vrallis (33290) | more than 12 years ago | (#3173040)

First, are you sure your UPS is a *TRUE* UPS? Even a lot of the 'high end' UPSes out there are still REALLY switched UPSes. This could very well be your problem.

The other one is something I've heard of (I'm not an electrical expert, but I'll try to explain). Larger (older installations, particularly) sites were wired for three-phase electricity. Over time, they split the phases for normal 110 volt usage. There is a chance where if the PC is connected to power on one phase, but the external unit is connected to power from a different phase, that the differential between the two can cause problems, due to the ground connection between the two through the cable shielding. I know, it sounds like something from the BOFH daily calendar, but it does make sense. Try making sure both pieces of equipment are on the same true UPS, or at least switched UPSes on the same circuit.

Good advice above. (2)

Futurepower(tm) (228467) | more than 12 years ago | (#3173424)


Sounds like good advice in the post above.

Some UPSs switch. Some are always online. You want the latter for a RAID array.

The second paragraph is important. Check your input power. Everything attached to your network should be wired to the same power circuit. Otherwise there is a possibility for feeding large spurious signals to your hardware through the power line.

Re:Good advice above. (3, Insightful)

walt-sjc (145127) | more than 12 years ago | (#3195198)

Ahhh! NO!!! Do NOT NOT NOT put everything on one circuit. First, computers with switching power supplies (almost 100% are) are NON-linear in power usage. They draw LARGE spikes of current sporadically. Second, if you blow a circuit, EVERYTHING YOU HAVE goes down. BAD BAD BAD! Third, if you run dual power supplies on your equipment, a power problem / spike on the circuit will affect both power supplies, not even counting that 50% of the benefit of dual power supplies is so that you have power redundancy.

As others have statued, make sure you have a true "online" ups, but ALSO make sure that you don't run over 50% power utilization on the UPS either due to the non-linear nature of switching power supplies.

Of course the BEST power stability solution is to use all 48VDC equipment like Telco's do. When was the last time your phone went down due to telco hardware failure? Note that most Major hardware vendors have 48VDC versions of their equipment (Sun, Cisco, etc.)

Clarification (3, Informative)

Futurepower(tm) (228467) | more than 12 years ago | (#3195438)


Everything needs to be on the same Ground circuit. It is necessary to avoid ground loops.

"They draw LARGE spikes of current sporadically."

I don't think this is correct. I have designed power supplies, and I don't immediately think of any reason why the power input of a switching power supply should vary differently from the power output. The only surge is when the hard disks spin up, but with SCSI there is a means to stagger the spin-up.

Re:Two possibilities... (3, Informative)

RatOmeter (468015) | more than 12 years ago | (#3176718)

"First, are you sure your UPS is a *TRUE* UPS?"

The term you're looking for here is "On-line UPS". There are two basic varieties of UPS, switched and on-line. Both share the following common features: The AC (mains) power coming into the UPS is rectified (converted to DC, usually in the range of 24 to 48 VDC). The DC is used to charge the batteries which are the source for backup power when the mains fails. AC backup power is supplied to your equipment by an invertor (DC to AC convertor) in the UPS which takes the battery's DC juice and "builds" a 50 or 60 Hz AC sine or pseudo sine wave at the right voltage.

Switched UPS: When the AC mains is OK, your equipment is being powered by it. When the mains fails, the UPS literally switches to backup power from the invertor. This switching takes a measureable amount of time to complete and relies on your equipment's electronics to ride-through the loss of power until the switch to invertor power is complete. Advantage? Switched UPS's are generally less expensive.

Online UPS: Regardless of whether the mains power is OK or not, the UPS's invertor is already on and already supplying your equipment. When the AC mains does fail (momentary loss, glitch, blackout or brownout), it takes zero time to switch to UPS power, because your equipment was already on UPS power! Advantages? (1) Zero switching time, (2) the online UPS will feed a constant, glitch-free sine wave to your equipment at the right frequency, the right RMS voltage all the time .

-

Re:Two possibilities... (0)

Anonymous Coward | more than 12 years ago | (#3179345)

...Advantage? Switched UPS's are generally less expensive... Also, Online UPSs use way more power to provide an ammount of output power. That inverter is always kickin', making waste heat, and making the electricity bill higher. In the case of this article, it sounds like the RAID would fail even if the power was perfect. Definitely a firmware/hardware problem.

Re:Two possibilities...(Dumb ASS!) (0)

Anonymous Coward | more than 12 years ago | (#3188549)

All 110v circuits are split off of two or three phase.

Where are you from, some place where the generate using some type of static electrical machine?

Don't use host based RAID (2)

ivan256 (17499) | more than 12 years ago | (#3173405)

Unless you're limited by cost, don't use host based RAID. It will always be less reliable then a dedicated RAID controller. If you must use host based RAID, try and find a card that supports RAID 0/1 because it's faster and more reliable. I've had good experiences with MegaRAID cards, and the IBM host based raid controllers, but by good experience I mean that I've only had a few problems. There is always a chance that something will get screwed up when you change your setup.

ICP Vortex + Cheetah (1)

Zurk (37028) | more than 12 years ago | (#3173421)

Use a high end ICP Vortex controller with 15K RPM Cheetah SCSI drives or Fujitsu drives. Its the only combo I trust in any of my PC servers.
Alternatively you could try Sun's A3500 FCAL drive arrays with the 15K cheetahs for non PC hardware.

Compaq is good. (3, Interesting)

NetJunkie (56134) | more than 12 years ago | (#3173668)

When I took over my current job the last network team had overloaded the circuits in the server room. We've had 3 circuits trip and had servers drop hard. None of the Compaq SmartArray controllers had any problems recovering.

I suggest you also fix you power problem. The systems should have no idea power was lost to the building. If you are using a UPS and this is still happening, I'd find a better one.

Re:Compaq is good. (0)

Anonymous Coward | more than 12 years ago | (#3199565)

Compaq is the last company I would rely on when it comes to storage/raid solutions.
I had nothing but trouble with their equipment.
When it comes to cheap pc storage Dell does a much better job ranging from small diskarrays to
real EMC FC arrays that are truely open to platforms other then Windows NT.

Compaq and HP Raid's are great - Dell is horrible. (0)

Anonymous Coward | more than 12 years ago | (#3174007)

I've never had trouble with Compaq and HP Raid-5's but Dell Raids randomly kill themselves (i.e. marking one drive as offline, and the whole raid dies... and Dell support doesn't know how the stuff works....)

Re:Compaq and HP Raid's are great - Dell is horrib (0)

Anonymous Coward | more than 12 years ago | (#3191713)

The Dell PERC (actually, I forget the company who actucally makes them as Dell just masks them as Dell) logic is written primarily by Intel. My co-worker's husband wrote a lot of code for the 3/ series. I'll be sure to tell her to give her husband some shit. =)

3ware (1)

Wolfkin (17910) | more than 12 years ago | (#3174215)

I've used 3ware Escalade cards repeatedly, and never had any problems. I've only actually used RAID 5 once, but so far no worries with it. Of course, these are IDE RAID cards, which may not be acceptable if you have lots of SCSI drives already.

Randall.

IBM ServeRaid (2, Informative)

decep (137319) | more than 12 years ago | (#3174251)

I have built serveral RAID configuration with IBM ServeRAID cotrollers. One RAID5 array (16 drives, 1 hot spare) that I've managed has had 2 drives fail in the past year; the only thing I've had to do is take the bad drive out, pop another one in and it is automatically marked as a hot spare.

I was expecting a hassle, but it was mind-blowing to see how easy it was. The cross-platform remote management utility is a plus too.

Sun A1000 (1)

Monkelectric (546685) | more than 12 years ago | (#3175801)

If you want bulletproof and are willing to pay for it, you wont go wrong with a sun A1000.

http://store.sun.com/catalog/doc/BrowsePage.jhtm l? cid=22455&parentId=67713

they range in size from 75gb to 436 gb, I work for an EDU so we get almost a 50% discount on them, but they are worth every penny ... we've had the 200+gb model running for almost 3 years now with no problems at all.

Re:Sun A1000 (2)

nbvb (32836) | more than 12 years ago | (#3178905)

I really hope you're kidding.

The A1000's stink. The firmware is awful; the RM6 management software is worse!

Be careful upgrading your firmware (which you need to do from time to time) -- the controller _can_ deadlock. And of course, if it does, you lose all your data, since the only copy of the LUN configuration is in the controller.

Seriously. They're crap. Built on the same crap as the A3000/3500 series. It's all old, re-branded Symbios stuff. Yuck-o.

You'd be better off getting an A5200 tray (or D1000 tray) and using the RAID-5 functions of Veritas Volume Manager instead. It actually has a shot at working :)

--NBVB

Re:Sun A1000 (1)

cprice (143407) | more than 12 years ago | (#3184018)

Ditto. RM6 stands for Raid Mangler 6 in our parts.
I have about 10 A1000 and 30 D1000 in production
and I'll take the simplicity of the D1000 jbod configs over Raid Mangler.

Re:Sun A1000 (1)

spinlocked (462072) | more than 12 years ago | (#3184298)

You'd be better off getting an A5200 tray (or D1000 tray) and using the RAID-5 functions of Veritas Volume Manager instead. It actually has a shot at working :)

I hope your kidding.

Software RAID5 on arrays with no cache? Heavens no, it sucks. Read performance sucks pretty bad considering the number of drives involved in the stripes and write performance is worse than dreadful even on high end machines. Write performance gets *even* worse the more drives you add unless you go across arrays - even then it just sucks. It's better on Veritas than Disksuite, but not much. Mirror, don't use RAID 5 on anything other that A3x00, A1000 or T3. It's especially good on the T3 where the XORs are done on the controller and it's almost as fast as striping.

I agree though, RM6 is pretty bad but if managed properly it's deployable. I know of one of Sun's customers who threw out terabytes of A5x00 storage after the GBIC debacle - as in deposited on the pavement outside of Sun's City of London office - only to replace them with A1000's and lots of them.

Re:Sun A1000 (1)

walt-sjc (145127) | more than 12 years ago | (#3195290)

Yup. I was one of the early adopters of the A5000. Pile of donkey doo. I had an entire team of Sun engineers out after escalating up to the VP level trying to get my setup working reliably. Not to mention that they won't even fit in a normal sized rack without taking the side panels off, and they are about a mile long (makes it a PAIN to use in a caged datacenter environment - you can't get around the equipment.)

since you didn't rule out software based (0)

Anonymous Coward | more than 12 years ago | (#3179819)

have you tried vinum or the linux raidtools?

i am running a vinum based r5 setup across 30 18G FC drives (i wget'd sandin's site as soon as i saw an article about his setup on bp6.com) without any problems.

then again, power going out where i work (a power plant) is rather unlikely =)

Mylex for linux raid (1)

damien_kane (519267) | more than 12 years ago | (#3185603)

Where I work all we use is Mylex cards for both 4 and 6 drive raid-5 implementations. We use IBM drives, but because of bad experiences lately (6 of them blowing up) we've recently switched to Seagate Cheetahs U have to drop your kernel down (we run stably under 2.2.12, and have had problems getting it to work on 2.2.18) but if you're running on linux-based servers, mylex is the way to go You can get both 32 and 64 bit PCI cards, and at only about 3-4 grand CDN a pop... it isn't that costly for a hardware RAID-5

Speaking from experience (0)

Anonymous Coward | more than 12 years ago | (#3185775)

Hitachi or EMC storage array. Built in UPS that will flush the cache to disk as a last resort before power can fail. And hitachi tells me they'll do linux. Awesome performance too. Money? If you have to ask....
--Chris

we had this exact problem! (1)

gkanai (148625) | more than 12 years ago | (#3187061)

We had this exact problem on our servers at work and it was a real headache getting them upgraded to the new firmware. It's a serious problem and it's imperative that you upgrade to the newest rev. of the firmware, not just the patch.

Intel's site has a technical advisory dated Jan 29th, 2002 regarding drives being 'marked offline".

http://support.intel.com/support/motherboards/se rv er/ta_445.htm

What about RaidZone? (0)

Anonymous Coward | more than 12 years ago | (#3188808)

What about RaidZone? [raidzone.com]


It's supposed to be fast, cheap, and reliable....

AMI MegaRAID, Mylex eXtremeRAID, ICP Vortex (1)

dmelomed (148666) | more than 12 years ago | (#3194658)

These are under $300 on Ebay, work great, and have many features. You'll have better compatibility experience with AMI cards, Mylex have more features though, but older eXtremeRAID have proprietary memory modules (which will cost $1000 retail if you want upgrade, if you find one somewhere).

ICP Vortex have great reputation, though I don't have any experience with them.

XML is the best place to start (2)

graveyhead (210996) | more than 12 years ago | (#3197460)

In this situation, I use XML. I invent my own markup language that is self-consistent and describes the API of a system. I then use an XSLT processor, Apache Xalan [apache.org] to be precise, to transform the source to various other formats including: a web site, one big printable web page, PDF, and I've been thinking about writing a stylesheet for man pages as well.



The only issue with a system like this is version control of your source files, which is highly situation specific.


Re:XML is the best place to start (1)

graveyhead (210996) | more than 12 years ago | (#3197493)

I apologise but I don't know how this post ended up in this thread. I posted to "Ask Slashdot: Beginning Project Documentation?", but forgot to login. It prompted me to login... I submitted and it ended up here!?!?!

Raidtec.. (1)

Chicane-UK (455253) | more than 12 years ago | (#3200478)

We have used Raidtec boxes for quite a long time, and they have always been very reliable.

I think all of our Raidtecs are kitted out with Seagate drives.. anyway, check out http://www.raidtec.com for a little more information on what they sell.

One excellent solution.... (1)

a9db0 (31053) | more than 12 years ago | (#3202262)

... is Compaq.

I've probably set up over 100 servers over the last 10 years or so, and I wouldn't use anything but Compaq Array controllers. I've never lost data because of a drive subsystem problem. I've got over 20 that I'm responsible for now, and all of them use Compaq Array controllers. They are reliable, easy to configure, well supported, and easy to maintain. The tools under NetWare and Windows work well. Most are supported under Linux. They aren't cheap,but they are simply great.

For details look here. [compaq.com]

I have worked for one large regional financial institution, and one large entertainment conglomerate, and one of the things they have in common is that both use Compaq hardware. There's a good reason - it works.

FWIW, I do not now, nor have I ever worked for Compaq, nor do I have any direct investment in Compaq.
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?
or Connect with...

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>