Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Are RAID Controllers the Next Data Center Bottleneck?

Soulskill posted more than 5 years ago | from the many-varied-pipes dept.

Data Storage 171

storagedude writes "This article suggests that most RAID controllers are completely unprepared for solid state drives and parallel file systems, all but guaranteeing another I/O bottleneck in data centers and another round of fixes and upgrades. What's more, some unnamed RAID vendors don't seem to even want to hear about the problem. Quoting: 'Common wisdom has held until now that I/O is random. This may have been true for many applications and file system allocation methodologies in the recent past, but with new file system allocation methods, pNFS and most importantly SSDs, the world as we know it is changing fast. RAID storage vendors who say that IOPS are all that matters for their controllers will be wrong within the next 18 months, if they aren't already.'"

cancel ×

171 comments

Sorry! There are no comments related to the filter you selected.

first! (-1, Flamebait)

Anonymous Coward | more than 5 years ago | (#28819381)

first!

I/O is random? What have you been smoking? (1, Interesting)

Anonymous Coward | more than 5 years ago | (#28819435)

It is very common when doing disk benchmarks so have separate tests for small random reads/writes, and large sequential reads/writes. The numbers are often different.

And while you can't always predict what disk sector is going to be read next, often you can, which is why predictive raid controllers with lots of memory are very useful.

I think we need a mod option to mod down the article summary: -1, stupid editor.

Re:I/O is random? What have you been smoking? (3, Insightful)

countertrolling (1585477) | more than 5 years ago | (#28820069)

I think we need a mod option to mod down the article summary: -1, stupid editor.

You had your chance [slashdot.org] .

Re:I/O is random? What have you been smoking? (5, Insightful)

Anpheus (908711) | more than 5 years ago | (#28820333)

All the important operations tend to be random. For a file server, you may have twenty people accessing files simultaneously. Or a hundred, or a thousand. For a webserver, it'll be hitting dozens or hundreds of static pages and, if you have database backend, that's almost entirely random as well.

For people consolidating physical servers to virtual servers, you now have two, three, ten or twenty VMs running on one machine. If every one of those VMs tries to do a "sequential" IO, it gets interlaced by the hypervisor into all the other sequential IOs. No hypervisor would dare tell all the other VMs to sit back and wait so that every IO is sequential. That delay could be seconds or minutes or hours.

Now imagine all that, and take into account that the latest Intel SSD gets around 6600 IOPS read and write. A good, fast hard drive gets 200. So you could put thirty three hard drives in RAID 0 and have the same number of IOPS, and your latency would still be worse. All the RAID0 really does for you is give you a nice big queue pipeline, like in a CPU. Your IO doesn't really get done faster, but you can have many more running simultaneously.

Given that SSDs are easily three to four times faster on sequential IO and an order of magnitude faster on random IO, I don't think it's that implausible to believe that the industry isn't ready.

Re:I/O is random? What have you been smoking? (1)

symbolset (646467) | more than 5 years ago | (#28821195)

Agree. For VM image files you may want to consider something else. The new PCIe attach ssd cards come in sizes to 1TB and have IOPS over 250,000. Streaming is likewise fast, and latency is very low. Which is nice.

Re:I/O is random? What have you been smoking? (1)

Sillygates (967271) | more than 5 years ago | (#28821211)

So you could put thirty three hard drives in RAID 0 and have the same number of IOPS, and your latency would still be worse.

Actually, thats incorrects, Here's why:

When you calculate IOPS, a good portion small of reads and writes get executed at random places on the disks. When you you make one filesystem write on a raid0 set (depending on how smart the raid0 controller is), it will be locking up several or ALL the disk spindles for that individual read/write.

The IOPS are negligibly better on a 33 disk raid0 set, and depending on your disk controller, it might be worse (every write equates to 33 dma requests).

It is faster for reading large files though, but that is NOT what a fair IOPS test measures.

For read operations, you can double your read IOPS by using a mirror. This is because ony semi decent controller will split all the read requests between the drives in the mirror. When your issuing lots of read requests from several threads, the load should be approximately equal across the drives.

Re:I/O is random? What have you been smoking? (2)

wagnerrp (1305589) | more than 5 years ago | (#28821557)

When you calculate IOPS, a good portion small of reads and writes get executed at random places on the disks. When you you make one filesystem write on a raid0 set (depending on how smart the raid0 controller is), it will be locking up several or ALL the disk spindles for that individual read/write.

Actually, that's incorrect. Here's why:

When you make a RAID0 array, you stripe large blocks between all the disks, usually 64K-256K large. If your operation does not cross the block boundary, you only access a single drive. Assuming those random small files are evenly distributed, your IOPS scale almost linearly with drive count.

Re:I/O is random? What have you been smoking? (1)

Dahamma (304068) | more than 5 years ago | (#28821303)

Good points, though of course some problems are more a matter of server design/allocation than any gross inadequacy on the part of the RAID controller. You can always try faster hardware to solve a performance problem, but a lot of time it's just due to bad software/configuration.

For example, no one in their right mind would share physical disks between 10-20 VMs in any application where disk performance is critical - a good server architect builds a system that works with the hardware available. Problem is, plenty of these applications/servers are not built by people in their right mind :)

Re:I/O is random? What have you been smoking? (1)

BikeHelmet (1437881) | more than 5 years ago | (#28822169)

I'd rather have an ioDrive.

See: http://hothardware.com/Articles/Fusionio-vs-Intel-X25M-SSD-RAID-Grudge-Match/?page=9 [hothardware.com]

With ludicrously high IOPS, your CPU doesn't have to do much waiting, which pretty much defeats any RAID solution. RAID usually raises overhead, because your CPU has to decide which device the requests go to - unless you use expensive hardware RAID controllers, all of which have IOPS caps. Most RAID solutions also go through slower interfaces - although compared to PCIe 2.0 4x, every interface(SATA1/2/3, USB2/3, etc.) is slow.

HDDs are impressive tech, but they have a different purpose. Density, longevity. SSDs are really going to shine for database stuff in the future. Prices are dropping rapidly, and are almost here!

distibution (1)

ArsonSmith (13997) | more than 5 years ago | (#28819447)

with things like Haadop and cloudstore, pNFS, Lustre, and others storage will be distributed. There will no longer be the huge EMC, Netapp, Hitachi etc central storage devices. There's no reason to pay big bucks for a giant single point of failure when you can use the Linus method of upload to the internet and let it get mirrored around the world. (In a much more localized manor.)

Re:distibution (3, Insightful)

bschorr (1316501) | more than 5 years ago | (#28819535)

That's fine for some things but I really don't want my confidential client work-product mirrored around the world. Despite all the cloud hype there is still a subset of data that I really do NOT want to let outside my corporate walls.

Re:distibution (2, Informative)

Ex-MislTech (557759) | more than 5 years ago | (#28819651)

This is correct, there are laws on the books in most countries that prohibit the exposure of medical and other data
to risk by putting it out in the open. Some have even moved to private virtual circuits, and the SAN's with fast
access via solid state storage of active files works fine, and it moves less accessed data to drive storage,
but none the less quite fast and SAS technology is faster than SCSI tech in throughput.

Re:distibution (2, Informative)

lgw (121541) | more than 5 years ago | (#28819979)

SAS technology is faster than SCSI tech in throughput

"SCSI" does not mean "parallel cable"!

Sorry, pet peev, but obviously Serial Attached SCSI [wikipedia.org] (SAS) is SCSI. All Fibre Channel storage speaks SCSI (the command set) all USB storage too. And iSCSI? Take a wild guess. Solid state hard drives that plug directly into PCIe slots with no other data bus? Still SCSI command set. Fast SATA drives? The high end ones often have a SATA-to-SCSI bridge chip in front of SCSI internals (and SAS can use SATA cabling anyhow these days).

Pardon me, I'll just be over here grumbling about this.

Re:distibution (1, Funny)

Anonymous Coward | more than 5 years ago | (#28820107)

"SCSI" does not mean "parallel cable"!

Ok, yes, you're correct.

But the common meaning of SCSI is "parallel SCSI", because for most of SCSI's existence parallel SCSI was the only option.

Similarly, ATA does technically include both parallel ATA and serial ATA. But the common meaning of ATA is parallel ATA, because for most of ATA's existence parallel ATA was the only option.

Pardon me, I'll just be over here grumbling about this.

You kids get off my lawn!

Re:distibution (1)

sjames (1099) | more than 5 years ago | (#28820425)

SCSI started life as a command set AND a physical signaling specification. The physical has evolved several times, but until recently was easily recognizable as a natural evolution of the original parallel SCSI. At the cost of a performance degradation and additional limitations (such as nimber of devices), the generations of scsi have interoperated with simple adapters.

SaS uses the same command set, but the physical is a radical departure (that is, it bears no resemblance) from the original SCSI and it's descendants. Arguably, if you're going to call SaS SCSI because of the command set, you'll have to also call USB SCSI. We call drives and controllers that speak the SCSI command set over fibre Fibre Channel. Drives that speak the SCSI command set over a high speed serial layer are called SaS. Drives that speak SCSI command set over USB are called USB. The devices can only interoperate with active translation. A simple connector adapter won't do it.

We call controllers and drives that speak the ATA command set over a fast serial bus SATA. As it turns out, because the command sets are so similar and the physical specs are close, SaS controllers are bi-lingual and can also speak ATA command set to SATA drives.

So, no, SaS drives are not obviously SCSI any more than a USB drive is obviously SCSI. SaS devices obviously speak the SCSI command set and are obviously targeted as the successor to SCSI.

Re:distibution (1)

lgw (121541) | more than 5 years ago | (#28821705)

I remember the days when people reading Slashdot wanted to use precise terminology about technology - don't you? Sure you do. But go on with your "Serial attached SCSI drives are not SCSI" and your "I double-clicked on the the internet, but it's broken" and so on. Those of us who are still nerds will pendanticly point out that all these storage technologies are "really SCSI drives, if you look closely" and we'll be right. Grumble grumble grumble.

Re:distibution (1)

mysidia (191772) | more than 5 years ago | (#28819805)

You could encrypt all the data using AES-512 or something stronger with various keys.

So you mirror the data all around the world, while concentrating on securing the encryption keys, which are a lot smaller than the data, and easier to distribute to secure locations only.

Re:distibution (2)

lgw (121541) | more than 5 years ago | (#28820093)

For my own personal data, I'd consider that adequate. For data I'm legally required to keep secret - absolutely not. Your physical security design should force an attacker to steal both your keys and your data, each from a seperate physical location, so that you can destroy one as soon at the other is stolen to prevent data loss. Electronic security of course focuses on compartmentalization and auditing, so that an inside attacker can only steal a small portionof the data, and can be caught an jailed afterwards. That's all pretty basic design.

Also, 256-bit symmetric encryption really is enough - it's firmly beyond the realm of what can be brute-forced, unless some fundamental understanding of physics is wrong. 256-bit AES is only vulnerable to weaknesses in the algorithim being discovered at some future point. If you're paranoid, you're far better off using 2 unrelated 256-bit symmetric algorithms than a symmetric key larger than 256 bits.

Re:distibution (1)

mysidia (191772) | more than 5 years ago | (#28820175)

It's more convenient to have the data in multiple places and divide the keys.

A 512 key basically provides you a security guarantee.

You can divide your 512 bit keys in half, and place half of the bits for each key at different places. Either, you use 2 256 bit keys and just string the bits together, or you XOR the key with a 512-bit random number, and store only that random number in one place, and the XOR result in the other place.

Then your security is actually greater than if you just secured the data and an encryption key separately; you have 3 isolated places where you have keys or data.

If a compromise of any 1 becomes known to you, you destroy all 3.

Also, you could use different keys for each different place you have stored a copy the data (that means you encrypt the data again every time you store it in a different place).

That way, should you destroy one location's keys, you retain redundant access to your other copies of the data.

Re:distibution (1)

lgw (121541) | more than 5 years ago | (#28820345)

A 512 key basically provides you a security guarantee.

Only salesmen talk about guarantees in security. Everything is vulnerable, it's just a question of effort.

There are several multi-key solutions you can buy from reputable vendors for at-rest data encryption (3 of 5 keycards needed, or 2 of 5, or whatever). That's a good approach to protection against insiders. It wouldn't justify making the at-rest data publicly accessible, nor failing to compartmentalize access.

And yeah, the "Store the key and the data in different buildings" approach is just one security consideration, you obviously want data mirroring, backups, a disaster recover plans, etc, etc. But you want the same degree of security for each copy of the data, not the formerly common "ultra-secure data backed up to unencrypted tapes and sent by UPS" approach.

Re:distibution (0)

Anonymous Coward | more than 5 years ago | (#28820291)

256-bit AES is only vulnerable to weaknesses in the algorithm being discovered at some future point.

You're also vulnerable to initially choosing a weak key with not a lot of randomness. Many encryption systems have used strong algorithms but fallen victim to this flaw.

Debian recently had this kind of flaw in SSL certificate & key generation.

Re:distibution (1)

SuperQ (431) | more than 5 years ago | (#28820091)

Uhh, are you dense? Distributed storage doesn't mean you use someone else's servers. The software mentioned above is for internal use. Hadoop is used by yahoo for their internal cloud, and Lustre is used by a number of scientific labs that do military work.

Re:distibution (1)

sjames (1099) | more than 5 years ago | (#28820151)

That's what strong crypto is for.

Re:distibution (1)

rubycodez (864176) | more than 5 years ago | (#28819691)

eh, properly designed systems using the big disk arrays certainly don't have a single point of failure. And their data is replicated to other big disk arrays in other locations. That's why they cost "the big bucks". Your cloud is fine for relatively low-speed low-security read-mostly data, but not for high-volume financial and healthcare systems

Re:distibution (1)

Chris Daniel (807289) | more than 5 years ago | (#28820127)

In a much more localized manor

We're going to start putting data centers in big houses now?

Wait. You mean my SAN is Dead? (4, Insightful)

mpapet (761907) | more than 5 years ago | (#28819481)

Hardware RAID's are not exactly hopping off the shelf and I think many shops are happy with fiberchannel.

Let's do another reality check: this is enterprise class hardware. Are you telling me you can get SSD RAID/SAN in a COTS package that is cost approximate to whatever is available now? Didn't think so....

Let's face it, in this class of hardware things move much more slowly.

Re:Wait. You mean my SAN is Dead? (1)

jon3k (691256) | more than 5 years ago | (#28820289)

Cost per IOPS yes, several vendors are selling SSD now. Cost per terrabyte, no, SSD isn't even close. What we're seeing now is a Tier 0 storage using SSD's. It fits in between RAM cache in SAN controller nodes and on-line storage (super fast, typically fiber channel storage vs near line).

So previously it looked like (slowest to fastest): SATA (near-line), Fiber Channel (online) -> RAM cache

Now we'll have: SATA -> FC -> SSD -> RAM

And in a few years after the technology gets better and much less expensive, we'll see: SATA -> SSD -> RAM

And hopefully eventually: SSD -> Memristors :)

That's ok (0)

Anonymous Coward | more than 5 years ago | (#28819505)

This article suggests that most RAID controllers are completely unprepared for solid state drives and parallel file systems

Right. The point of a parallel file system is that you do not need RAID. Slashdot's editors must think really low of their readers.

Re:That's ok (1)

Jafafa Hots (580169) | more than 5 years ago | (#28820185)

The READERS think low of the readers, why should the editors be any different?

Re:That's ok (1)

Troy Baer (1395) | more than 5 years ago | (#28822841)

The point of a parallel file system is that you do not need RAID.

Really? Why has virtually every production parallel file system implementation I've ever seen (using GPFS, Lustre, and PVFS) been done on top of hardware RAID controllers?

BAD MATH (5, Interesting)

adisakp (705706) | more than 5 years ago | (#28819631)

FTA Since a disk sector is 512 bytes, requests would translate to 26.9 MB/sec if 55,000 IOPS were done with this size. On the other end of testing for small block random is 8192 byte I/O requests, which are likely the largest request sizes that are considered small block I/O, which translates into 429.7 MB/sec with 55,000 requests

I'm not going to believe an article that assumes that because you can do 55K IOPS for 512Byte reads, you can do the same number of IOPS for 8K reads which are 16X larger and then just extrapolate from there. Especially since most SSD's (at least SATA ones) right now top out around 200MB/s and the SATA interface tops out at 300MB/s. Besides there are already real world articles out there where guys with simple RAID0 SSD's are getting 500-600 MB with 3-4 drives using Motherboard RAID much less dedicated harware RAID.

Re:BAD MATH (4, Insightful)

fuzzyfuzzyfungus (1223518) | more than 5 years ago | (#28820033)

"simple RAID0 SSD's are getting 500-600 MB with 3-4 drives using Motherboard RAID much less dedicated harware RAID."

The last part of that sentence is particularly interesting in the context of this article. "Motherboard RAID" is, outside of the very highest end motherboards, usually just bog-standard software raid with just enough BIOS goo to make it bootable. Hardware RAID, by contrast, actually has its own little processor and does the work itself. Of late, general purpose microprocessors have been getting faster, and cores in common systems have been getting more numerous, at a substantially greater rate than hardware RAID cards have been getting spec bumps(outside of the super high end stuff, I'm not talking about whatever EMC is connecting 256 fibre channel drives to, I'm talking about anything you could get for less than $1,500 and shove in a PCIe slot). Perhaps more importantly, the sophistication of OS support for nontrivial multi-disk configurations(software RAID, ZFS, storage pools, etc.) has been getting steadily greater and more mature, with a good deal of competition between OSes and vendors. RAID cards, by contrast, leave you stuck with whatever firmware updates the vendor deigns to give you.

I'd be inclined to suspect that, for a great many applications, dedicated hardware RAID will die(the performance and uptime of a $1,000 server with a $500 RAID card will be worse than a $1,500 server with software RAID, for instance) or be replaced by software RAID with coprocessor support(in the same way that encryption is generally handled by the OS, in software; but can be supplemented with crypto accelerator cards if desired).

Dedicated RAID of various flavors probably will hang on in high end applications(just as high end switches and rouers typically still have loads of custom ASICs and secret sauce, while low end ones are typically just embedded *nix boxes on commodity architectures); but the low end seems increasingly hostile.

Re:BAD MATH (1)

SuperQ (431) | more than 5 years ago | (#28820121)

It doesn't matter that SATA can do 300MB/s. That's just the interface line rate. Last I did benchmarks of 1T drives (seagate ES.2) they topped out at around 100MB/s. Drives still have a long way to go before they saturate the SATA bus. The only way that happens is if you are using port multipliers to reduce the number of host channels.

Re:BAD MATH (2, Interesting)

jon3k (691256) | more than 5 years ago | (#28820355)

You forgot about SSDs, consumer versions of which are already doing over 250MB/s reads for less than $3.00/GB. And we're still essentially talking about second generation products (Vertex switched from JMICRON to Indilinx controllers and Intel basically just shrunk down to 34nm for their new ones, although their old version did 250MB/s as well).

I'm using a 30GB OCZ Vertex for my main drive on my windows machine and it benchmarks around 230MB/s _AVERAGE_ read speed. It cost $130 ($4.30/GB) when I bought it a couple months ago, and prices are falling. The new Intel X25-M is $225 for 80GB ($2.81/GB).

Re:BAD MATH (1)

jon3k (691256) | more than 5 years ago | (#28820311)

Vertex (with Indilinx controllers) and Intel (even the "cheap" MLC drives from both vendors that are less than $3.00/GB) are seeing 250MB/s-270MB/s actual real world results for reads. The actual throughput of SATA 3G is actually slightly less than 300MB/s so essentially we're at the limitation of SATA 3G, or very very close -- too close for comfort.

Re:BAD MATH (1)

Rockoon (1252108) | more than 5 years ago | (#28820911)

Remember that Intel entered this market as a tiger out for blood with their *first* SSD throwing data at just under the SATA300 cap. This isnt a coincedence.

When SATA600 goes live, expect Intel and OCZ to jump right up to the 520MB/sec area as if it was trivial to do so... (because it is!)
ioFusion has a PCIe flash solution that goes several times faster than these SATA300 SSD's. The problem is SATA. The problem is SATA. The problem is SATA.

Re:BAD MATH (1)

drsmithy (35869) | more than 5 years ago | (#28821985)

Besides there are already real world articles out there where guys with simple RAID0 SSD's are getting 500-600 MB with 3-4 drives using Motherboard RAID much less dedicated harware RAID.

It is unlikely "dedicated hardware RAID" would be meaningfully faster.

enterprise storage (3, Insightful)

perlchild (582235) | more than 5 years ago | (#28819635)

Storage has been the performance bottleneck for so long, it's a happy problem if you actually must increase the bus speeds/cpu processors/get faster memory on raid cards to keep up. Seems to me the article(or at least the summary) was written by someone hadn't been following enterprise storage for very long...

Re:enterprise storage (1, Insightful)

Anonymous Coward | more than 5 years ago | (#28819823)

Damn straight! IO has been the bottleneck for at least 40 years. SSD is slowly opening doors to a brighter future, but we're a long way from the realistic capacity needs for business. Although I've yet to see real benchmarking that is designed for hundreds of simultaneous tasks, all the figures I see are largely rubbish assuming the user does one or two things. How about testing them on web services like digg, or on company mail servers instead of fake throughput and "feel" tests?

Re:enterprise storage (1)

Chang (2714) | more than 5 years ago | (#28821811)

Or on a large VM cluster - which thousands of data centers have in production now.

Re:enterprise storage (2, Interesting)

ZosX (517789) | more than 5 years ago | (#28820027)

That's kind of what I was thinking too. When you really start pushing the 300mb/s sata gives its hard to find something to complain about. Most of my hard drives max out at like 60-100mb a second and even the 15,000k drives are not a great deal faster. Low latency, fast speeds, increased reliability. This could get interesting in the next few years. Heck why not just build a raid 0 controller into the logic card with a sata connection and break the ssd into a bunch of little chunks and raid 0 them all max performance right out of the box so you get the performance advantages of raid without the cost of a card and the waste of a slot? PCIe SSD is quite interesting too..........

Re:enterprise storage (1)

jon3k (691256) | more than 5 years ago | (#28820387)

"Heck why not just build a raid 0 controller into the logic card with a sata connection and break the ssd into a bunch of little chunks and raid 0 them all"

Cost mostly, you'd need tons of controllers, cache, etc. Plus you can already nearly saturate SATA 3G with any decent SSD (Intel, Vertex, etc) so it's kind of pointless. The new Vertex and Intel SSDs are benchmarking at 250MB/s. Not point it making them much faster until we have SATA 6G.

Re:enterprise storage (1)

drsmithy (35869) | more than 5 years ago | (#28822011)

Heck why not just build a raid 0 controller into the logic card with a sata connection and break the ssd into a bunch of little chunks and raid 0 them all max performance right out of the box so you get the performance advantages of raid without the cost of a card and the waste of a slot?

Because an error anywhere nukes the whole shebang.

Re:enterprise storage (4, Interesting)

HockeyPuck (141947) | more than 5 years ago | (#28820073)

Ah... pointing the finger at the storage... My favorite activity. Listening to DBAs, application writers, etc point the finger at the EMC DMX with 256GB of mirrored cache and 4Gb/s FC interfaces. You point your finger and say, "I need 8Gb FibreChannel!. Yet when I look at your hba utilization over a 3mo period (including quarter end, month end etc..) I see you averaging a paltry 100MB/s. Wow. Guess I could have saved thousands of dollars with going with 2Gb/s HBAs. Oh yeah, and you have a minimum of two HBAs per server. Running a nagios application to poll our switchports for utilization, the average host is running maybe 20% utilization of the link speed, and as you beg, "Gimme 8Gb/s FC", I look forward to your 10% utilization.

We've taken whole databases and loaded them into dedicated cache drives on the array, and surprise, no performance increase. DBAs and application writers have gotten so used to yelling, "Add Hardware! That they forgot how to optimize their applications and sql queries."

If storage was the bottleneck, I wouldn't be loading up storage ports (FAs) with 10-15 servers. I find it funny that the only devices on my 10,000 port SAN that can sufficiently drive IO are media servers and the tape drives (LTO-4) that they push.

If storage was the bottleneck there would be no oversubscription in the SAN or disk array. Let me know when you demand a single storage port per HBA, and I'm sure my EMC will take us all out to lunch.

I have more data than you. :)

Re:enterprise storage (4, Insightful)

Anonymous Coward | more than 5 years ago | (#28820205)

Ah... pointing the finger at the storage... My favorite activity. Listening to DBAs, application writers, etc point the finger at the EMC DMX with 256GB of mirrored cache and 4Gb/s FC interfaces. You point your finger and say, "I need 8Gb FibreChannel!. Yet when I look at your hba utilization over a 3mo period (including quarter end, month end etc..) I see you averaging a paltry 100MB/s. Wow. Guess I could have saved thousands of dollars with going with 2Gb/s HBAs. Oh yeah, and you have a minimum of two HBAs per server. Running a nagios application to poll our switchports for utilization, the average host is running maybe 20% utilization of the link speed, and as you beg, "Gimme 8Gb/s FC", I look forward to your 10% utilization.

You do sound like you know what you're doing, but there is quite a difference between average utilization and peak utilization. I have some servers that average less than 5% usage on a daily basis, but will briefly max out the connection about 5-6 times per day. For some applications, more peak speed does matter.

Re:enterprise storage (1)

swb (14022) | more than 5 years ago | (#28820377)

In my experience, DBAs and their fellow travelers in the application group like to point their finger at SANs and virtualization and scream about performance, not because the performance isn't adequate but because SANs (and virtualization) threaten their little app/db server empire. When they no longer "need" the direct attached storage, their dedicated boxes get folded into the ESX clusters and they have to slink back into their cubicles and quit being server & networking dilettantes.

Re:enterprise storage (0)

Anonymous Coward | more than 5 years ago | (#28821053)

In my experience, DBAs and their fellow travelers in the application group like to point their finger at SANs and virtualization and scream about performance, not because the performance isn't adequate but because SANs (and virtualization) threaten their little app/db server empire. When they no longer "need" the direct attached storage, their dedicated boxes get folded into the ESX clusters and they have to slink back into their cubicles and quit being server & networking dilettantes.

amen!

Re:enterprise storage (4, Insightful)

Slippy. (42536) | more than 5 years ago | (#28820557)

Sort of true, but not entirely accurate.

Is the on-demand response slow? Stats lie. Stats mislead. Stats are only stats. The systems I'm monitoring would use more I/O if they could. Those basic read/write graphs are just the start. How's the latency? Any errors? Pathing setup good? Are the systems queuing i/o requests while waiting for i/o service response?

And traffic is almost always bursty unless the link is maxed - you're checking out a nice graph of the maximums too, I hope? That average looks mighty deceiving when long periods are compressed. At an extreme over months or years, data points can be days. Overnight + workday could = 50%. No big deal on the average.

I have a similiar usage situation on many systems, but the limits are generally still storage dependent issues like i/o latency (apps make a limited number of requests before requests start queuing), poorly grown storage (a few luns there, a few here, everything is suddenly slowing down due to striping in one over-subscribed drawer), and sometimes unexpected network latency on the SAN (switch bottlenecks on the path to the storage).

Those graphs of i/o may look pitiful, but perhaps that's only because the poor servers can't get the data any faster.

Older enterprise SAN units (even just 4 or 5 years ago) kinda suck performance wise. The specs are lies in the real world. A newer unit, newer drives, newer connects and just like a server, you'll be shocked. What'cha know, those 4Gb cards are good for 4Gb after all!

Every year, there's a few changes and growth, just like in every other tech sector.

Re:enterprise storage (1)

7213 (122294) | more than 5 years ago | (#28820973)

Slippy you are spot on sir,

Looking at your SAN utilization and seeing HBA throughput of next to nothing is not necessarily proof it's the app or db. As a storage admin, I'd love to say it's never our fault, but clearly if your spending all your time looking at the network and not the disk utilization itself, your looking in the wrong place. I agree that 8Gb and even 4Gb links for disk HBAs are usually way overkill (on average), but that often is due to the fact that the spindles on the backend (or the server itself) can't service the load being pushed by the app or db. I rarely even bother looking at perf stats on my switches, as I know they will be under 50% utilized. But when I see my disk response times going well into the double digits, I know that the switches are not at issue and we may need to address the disk layout.

As per the death of raid, I also see it's day coming. I don't mean to sound like an IBM fanboi (I'm not) but there XIV product looks like they've got the right idea for anti-raid (i think HPs EVA does this to a lesser extent). Band as many cheap ol' SATA disks together as possible, and be mad paranoid about mirroring it. Need more capacity? no worries pop it in and we'll start moving data around to reduce access density automagicly. Teiring your storage is a bad idea that needs to die (mico-managment) as to lower your access density you end up short stroking. Instead put that low IO NAS device right next to your high IO financial system on disk and use both the capacity & IO ability of the drive effectively. I -personly- don't see the death of centralized storage coming anytime soon, it's just that 1) raid needs to die as we spread IO over more & more larger & slower per GB spindles, and 2) for the near term (5+ years) that SSD stuff is going to be used as a second level cache at best do to cost per GB. Either w/the disk array using it as a second stage cache, or intelligently written apps doing it for themselves (likely better for everyone).

The 'cloud' is a lie for most enterprise class apps, at least when you get to the DB level. (p.s. I also get frustrated with the app & dba premadonas, but it's even WORSE on the rare occasion when they are right ;-) )

Re:enterprise storage (1)

perlchild (582235) | more than 5 years ago | (#28821021)

Well boy did I not expect this kind of reaction... I'm kinda on your side, really. I meant, here's someone that's saying that SSDs means you're no longer starving for spindles... And I say "well that's good, they were holding us back, we can do something better now, that's not a problem." On the other hand, it seems it's a lot more loaded politically in places that don't do this with just three admins, and no dedicated storage admins, so I'll just shut up now cuz I hate politics. You guys have a nice day.

Re:enterprise storage (1)

markk (35828) | more than 5 years ago | (#28822045)

Who cares about average use? The cost is driven by the PEAK use. That is why the average use for HBA's is almost nothing, but you are paying double the money or more because of the 8 hours a month you need to smoke. And woe betide the Architect who suggest postponing a business meeting for 48 hour every month so he can save $20 million a year. Seriously.

Re:enterprise storage (1)

natas (105637) | more than 5 years ago | (#28822301)

Which nagios plugin are you using to poll this. We are running into the same exact situation where I work. Oracle dba's screaming that its IO but our san guys do not see it.

Re:enterprise storage (0, Flamebait)

Gothmolly (148874) | more than 5 years ago | (#28822403)

Running a nagios application to poll our switchports for utilization, the average host is running maybe 20% utilization of the link speed

You sound gay.

Re:enterprise storage (1)

marcosdumay (620877) | more than 5 years ago | (#28822493)

Well, that was unexpected for me too. And you know, you are right. Real world applications behave quite differently from how academical models say they would, that is because the models didn't model teams limitations and the unavoidable mistakes (from the techies and from the HR) that add into some very siginificant amount on any project.

Too bad I didn't let that academical misconception go yet. That is why I was surprized.

Hardware RAID becoming less relevant every day. (1, Insightful)

Vellmont (569020) | more than 5 years ago | (#28819639)

The first question is really, why RAID a SSD? It's already more reliable than a mechanical disk, so that argument goes out the window. You might get some increased performance, but that's often not a big factor.

The second question is, with processors coming with 8 cores, why have some separate specialized controller that handles RAID and not just do it in software?

Re:Hardware RAID becoming less relevant every day. (0)

Anonymous Coward | more than 5 years ago | (#28819677)

You tell my boss that raid 1+0, 3, 5 or 6 is a waste of time! They sell ECC RAM for a reason as well. If a SSD craps out, which they WILL do (just look at reviews on newegg for proof of that) you'll need a RAID level with redundancy to fix it.

Re:Hardware RAID becoming less relevant every day. (0)

Anonymous Coward | more than 5 years ago | (#28819707)

General purpose CPUs are bad at calculating parity in comparison to dedicated hardware.

There will still be a want for RAID as it provides a measure of data integrity without duplication of data. e.g RAID 5 you lose 1/4 of data space in parity instead of 1/2 in mirroring.

Re:Hardware RAID becoming less relevant every day. (2, Informative)

Alain Williams (2972) | more than 5 years ago | (#28819727)

The second question is, with processors coming with 8 cores, why have some separate specialized controller that handles RAID and not just do it in software?

I much prefer s/ware raid (Linux kernel dm_mirror), it removes a complicated piece of h/ware which is just another thing to go wrong. It also means that you can see the real disks that make up the mirror and so monitor it with the smart tools.

OK: if you do raid5 rather than mirroring (raid1) you might want a h/ware card to offload the work to, but for many systems a few terabyte disks are big and cheap enough to just mirror.

Re:Hardware RAID becoming less relevant every day. (1)

potHead42 (188922) | more than 5 years ago | (#28819779)

It also means that you can see the real disks that make up the mirror and so monitor it with the smart tools.

With 3ware RAID controllers this is already possible, you just have to specify the magic device /dev/twa0 (for the first controller) and use the smartd/smartctl option "-d 3ware,0", where 0 specifies the disk number. I assume other controllers have something similar.

But yeah, I also prefer software RAID, especially when using ZFS ;-)

Re:Hardware RAID becoming less relevant every day. (2, Informative)

mysidia (191772) | more than 5 years ago | (#28820075)

Well, ZFS is great, but don't get that mixed up with software RAID. It's not. The storage redundancy algorithms used by ZFS are not the RAID algorithms, such that using ZFS is much better than using EITHER hardware or software RAID.

ZFS provides performance and data integrity assurance that standard RAID does not. Primarily, because filesystem level data is checksummed, and it should be almost impossible for silent data corruption to occur at the storage device level, except cases where the data written actually matches the checksums, (a later 'zpool scrub' should detect it, if ZFS is implemented properly).

But aside from ZFS, software RAID (and even fakeraid/hostraid hardware adapters that perform RAID in the driver) really really suck both in terms of reliability, data integrity, and performance when you need to push things to the maximum, compared to a good hardware RAID controller; software RAID is measurably slower on the same CPU and memory.

SMART provides so little of what you need to be doing to keep a reliable array, it isn't even funny.

Good hardware controllers keep metadata and do frequent consistency checks / "scrubs" / surface scans, to ensure every bit of data is periodically read from every drive, so HDD firmware has an opportunity to fix errors before they become "unrecoverable read errors".

Hardware controllers will also detect when a hard drive is having a problem that cannot be easily identified by software. Hard drives are direcly plugged into the controller; it can detect things such as abnormal command response latencies.

A software controller can't be sure the abnormal latency isn't due to other workload on the bus, or "not a drive failing", so the HW controller is more responsive to failure.

HW contollers also provide writethrough caching, and sometimes have a BBU with a full writeback cache, which drastically helps performance, and reduces the RAID performance penalty, which software RAID doesn't mitigate, but in fact makes worse.

Oh yes, and Good controllers also have monitoring and administration tools for various OSes, including Linux, Windows, and Solaris, produced by the manufacturer.

Many of the good controllers come equipped with audible alarms and terminals for you to plug drive failure LEDs into, so that anyone near the server can know a drive has failed, and which one.

Re:Hardware RAID becoming less relevant every day. (1)

PiSkyHi (1049584) | more than 5 years ago | (#28820641)

You've added a piece of hardware to do RAID, which may have more bells and whistles, but all I really want is an email when a drive fails. The drive is nothing compared to the data, so all I need is a controller that supports hotplug. Silicon Image make one that should be standard on most MBs but isn't yet because of hardware RAID being an industry trying to stay alive.

If the controller fails, for Hardware RAID, I'm looking at wasting time and a lot of cash to get that data back online. For software RAID, a controller is a no brainer.

performance ? dedicate a cheap PC to the array, you can always change your mind with software.

Getting firmware that can merely read and write a drive at some stage in the future is always going to be easier than managing the application level RAID management software updates.

Seriously, the email is enough of bells and whistles for a storage array, I hope no one in your work area has to sit near the actual thing.

Re:Hardware RAID becoming less relevant every day. (1)

mysidia (191772) | more than 5 years ago | (#28821319)

The chance of a controller failing is almost negligible; it's similar to the chance of a NIC or CPU failing: hard drives fail much more often. If you stick with a standard common controller type for all your servers, eg use all HP DL3xx or Dell PE 29xx servers with embedded controllers (for example), getting a spare should be easy, cheaper than the offline spare HD you should be keeping to restore redundancy to the array, and the broken controller should be covered by warranty.

Well, servers belong on server racks in closed rooms; they're so loud, that if someone had to sit near it, the fan noise from all the servers would be overwhelming.

And just getting an e-mail has the problem of not identifying precisely which drive has failed.

If you call up the datacenter tech (remote hands) and tell them to swap out the drive with the spare, there's a chance they'll accidentally pull the wrong drive, or pull the right drive from the wrong server.

Visible indications tend to be pretty useful in avoiding mistakes, and it's a good idea to take every reasonable precaution in assuring mistakes don't happen, if the server uptime is important: if it's not, why use RAID? Just swap the drive and load the backup, that procedure is a lot more reliable than hot plugging.

Re:Hardware RAID becoming less relevant every day. (0)

Anonymous Coward | more than 5 years ago | (#28819951)

> OK: if you do raid5 ... .. you deserve to be shot.

Re:Hardware RAID becoming less relevant every day. (1)

duguk (589689) | more than 5 years ago | (#28821003)

> OK: if you do raid5 ... .. you deserve to be shot.

There's nothing wrong with RAID5 in the right circumstances (large home server?), but if you use it instead of a backup you deserve to be shot.

Re:Hardware RAID becoming less relevant every day. (1)

adisakp (705706) | more than 5 years ago | (#28819741)

The first question is really, why RAID a SSD? It's already more reliable than a mechanical disk, so that argument goes out the window. You might get some increased performance, but that's often not a big factor.

The second question is, with processors coming with 8 cores, why have some separate specialized controller that handles RAID and not just do it in software?

RAID0 for Speed. SSD's in RAID0 can perform 2.5-3X faster [hothardware.com] than a single drive. A RAID SSD array can challenge the speed of a FusionIO card [hothardware.com] that is several thousand dollars.

Now that the new faster 34nm Intel SSD's can be preordered for under $250, it's reasonable for an enthusiast to buy 3-4 of them and thrown them in a RAID0 array. Also, software (or built-in MB RAID) is fine -- a lot of the sites have shown that 3 SSD drives is the sweet point for price/performance using standard MB RAID controllers. If you want 4 or more, to see performance, you need a more $$$ separate controller card.

Re:Hardware RAID becoming less relevant every day. (1)

Rockoon (1252108) | more than 5 years ago | (#28819927)

Don't forget about the "Battleship MTRON" guys that raided up 8 MTRON SSD's (the fastest SSD's at the time) several years ago and then had a lot of trouble actualy finding a raid controller than could handle the bandwidth. This years SSD's are twice as fast, and expect performance to double again within 12 months.

Re:Hardware RAID becoming less relevant every day. (1)

jon3k (691256) | more than 5 years ago | (#28820415)

I'll take 4 drives in RAID10 please :)

Re:Hardware RAID becoming less relevant every day. (1)

Yert (25874) | more than 5 years ago | (#28819811)

Because I can't buy a 26TB SSD drive, but I can put 52 500GB SSD drives in two CoreRAID chassis and mount them as one filesystem...as opposed to the 2 Sun storage arrays we use now, that are fiber attached and starting to get a little ... slow. SSDs would give us 10x the IO overhead.

Re:Hardware RAID becoming less relevant every day. (1)

TerminaMorte (729622) | more than 5 years ago | (#28819843)

In enterprise, it doesn't matter if the disk has a less likely chance of failing; redundancy for HA is worth the extra cost.

If someone is spending the money on SSD then performance had better be a big factor!

Re:Hardware RAID becoming less relevant every day. (1)

amRadioHed (463061) | more than 5 years ago | (#28822657)

That's only true to a point. If the reliability of the SSD gets to the point where it's about as likely as the RAID controller to fail, then the RAID controller is just an extra point of failure that will not increase your availability at all. However, AFAIK SSDs aren't that reliable yet so the RAID controllers are still worth it.

the on board chips are not build for high speed / (1)

Joe The Dragon (967727) | more than 5 years ago | (#28820097)

the on board chips are not build for high speed / useing all the ports at the max at one time.

Re:Hardware RAID becoming less relevant every day. (1)

Helmholtz (2715) | more than 5 years ago | (#28820111)

This is where ZFS has some potential to become even more important than it already is.

The reason you RAID a SSD is to protect against silent data corruption, which SSDs are not immune from. While you don't necessarily need RAID for this with ZFS, it certainly makes it easier.

The point about the insane abundance of CPU power is one that ZFS specifically takes advantage of right out of the starting gate.

Re:Hardware RAID becoming less relevant every day. (1)

darkjedi521 (744526) | more than 5 years ago | (#28820629)

I have data sets spanning multiple terabytes. One recent PhD graduate in the lab I support accumulated 20 TB of results during his time here. Even if I had highly reliable SSDs that never failed, I'd still toss the SSDs together in a zpool to get the capacities I need to accommodate a single data set. RAID is not just about redundancy. With SSDs, I'd probably use RAID5 instead of RAID6 just in case I had a freak bad drive, but RAID in some form is here to stay.

Re:Hardware RAID becoming less relevant every day. (1)

drsmithy (35869) | more than 5 years ago | (#28822051)

The second question is, with processors coming with 8 cores, why have some separate specialized controller that handles RAID and not just do it in software?

Transparency and simplicity. It's a lot easier dealing with a single device than a dozen.

Real DC (0)

Anonymous Coward | more than 5 years ago | (#28819645)

Run FiberChannel

iscsi, 10gig (1)

Colin Smith (2679) | more than 5 years ago | (#28819711)

Multiple interfaces and lots of block servers.

Does anyone actually still use NFS?

 

Re:iscsi, 10gig (2, Informative)

Anonymous Coward | more than 5 years ago | (#28820025)

Of course. NFS provides an easy to use concurrent shared filesystem that doesn't require any cluster overhead or complication like GFS or GPFS.

Re:iscsi, 10gig (2, Informative)

drsmithy (35869) | more than 5 years ago | (#28822121)

Does anyone actually still use NFS?

Of course. It's nearly always fast enough, trivially simple to setup, and doesn't need complicated and fragile clustering software so that multiple systems can access the same disk space.

Not quite (3, Informative)

greg1104 (461138) | more than 5 years ago | (#28819763)

There may need to be some minor rethinking of controller throughput for read applications on smaller data sets for SSD. But right now, I regularly saturate the controller or bus when running sequential RW tests against a large number of physical drives in a RAID{1}0 array, so it's not like that's anything new. Using SSD just makes it more likely that will happen even on random workloads.

There are two major problems with this analysis though. The first is that it presumes SSD will be large enough for the sorts of workloads people with RAID controllers encounter. While there are certainly people using such controllers to accelerate small data sets, you'll find just as many people who are using RAID to handle large amounts of data. Right now, if you've got terabytes of stuff, it's just not practical to use SSD yet. For example, I do database work for living, and the only place we're using SSD right now is for holding indexes. None of the data can fit, and the data growth volume is such that I don't even expect SSDs to ever catch up--hard drives are just keeping up with the pace of data growth.

The second problem is that SSDs rely on volatile write caches in order to achieve their stated write performance, which is just plain not acceptable for enterprise applications where honoring fsync is important, like all database ones. You end up with disk corruption if there's a crash [mysqlperformanceblog.com] , and as you can see in that article once everything was switched to only relying on non-volatile cache the performance of the SSD wasn't that much better than the RAID 10 system under test. The write IOPS claims of Intel's SSD products are garbage if you care about honoring write guarantees, which means it's not that hard to keep with them after all on the write side in a serious application.

Mod Parent Up (1)

sirwired (27582) | more than 5 years ago | (#28819913)

The fact that SSD perf drops like a rock when you actually need to be absolutely sure the data makes it to disk is huge factor in enterprise storage. No enterprise storage customer is going to accept the possibility their data goes down the bit-bucket just because somebody tripped over the power cord. Enterprise databases are built around the idea that when the storage stack says data has been written, it has, in fact, been written. Storage vendors spend a great deal of money, effort, and complexity guaranteeing the non-volatility of write cache; for SSD to ignore that requirement when publishing performance data is fundamentally dishonest.

SirWired

Re:Not quite (2, Insightful)

A beautiful mind (821714) | more than 5 years ago | (#28820745)

The second problem is that SSDs rely on volatile write caches in order to achieve their stated write performance, which is just plain not acceptable for enterprise applications where honoring fsync is important, like all database ones. You end up with disk corruption if there's a crash, and as you can see in that article once everything was switched to only relying on non-volatile cache the performance of the SSD wasn't that much better than the RAID 10 system under test. The write IOPS claims of Intel's SSD products are garbage if you care about honoring write guarantees, which means it's not that hard to keep with them after all on the write side in a serious application.

Most enterprise level SSDs have BBWC [google.com] already for exactly that reason. On those systems fsync is a noop. I for one am looking forward to SSDs in enterprise level applications, we could easily consolidate current database servers that are IOPS bottlenecked, with very low levels of CPU and non-caching memory utilization. BBWC solves the "oh, but we need to honour fsync" kind of problems. We're looking at a performance increase of 10-20x (IOPS) easily if >500G enterprise level SSDs become available for database servers. Even if prices/GB stay way above SAN prices, it's still more than worth it to switch.

Re:Not quite (2, Insightful)

greg1104 (461138) | more than 5 years ago | (#28821025)

You can't turn fsync into a complete noop just by putting a cache in the middle. A fsync call on the OS side that forces that write out to cache will block if the BBWC is full for example, and if the underlying device can't write fast enough without its own cache being turned on you'll still be in trouble.

While the cache in the middle will improve the situation by coalescing writes into the form the SSD can handle efficiently, the published SSD write IOPS numbers are still quite inflated relative to what you'll actually see. What I was trying to suggest is that the performance gap isn't nearly as large as suggested by the article of TFA once you start building real-world systems around them. After all, regular discs benefit from the write combining to lower seeks you get out of a BBWC, too, even more than the SSDs do.

The other funny thing you discover if you benchmark enough of these things is that a regular hard drive confined to only use as much space as a SSD provides is quite a bit faster too. When you limit a 500GB SATA drive to only use 64GB (a standard bit of short stroking [tomshardware.com] ), there's a big improvement in sequential and seek speeds there. If you want to be fair, you should only compare your hard drive's IOPS when it's configured to only provide as much space as the SSD you're comparing against.

Re:Not quite (0)

Anonymous Coward | more than 5 years ago | (#28820967)

Parent has it right. There are good things to use SSD for (I can't talk about them), but flat out replacement of RAID arrays aren't one of them.

Re:Not quite (1)

AllynM (600515) | more than 5 years ago | (#28821199)

First a quick clarification: Intel X25 series SSDs do not use their RAM as a data writeback cache. Intel ships racks full of both M and E series drives, with those drives living in a RAID configuration. They couldn't pull that off if the array was corrupted on power loss. The competition had to start using large caches to reduce write stutters and increase random write performance, mostly in an attempt to catch up to Intel.

The parent article is a bit 'off' as far as bandwidth vs. IOPS on RAID controllers. You can saturate even the best PCI-e RAID cards with only spinning disks. I'm currently pegging an Areca with 10 1TB 5400 RPM drives. The ultimate bandwidth is not limited by bus speed - it is the speed of the internal data pipelines within the card itself. I have yet to see a RAID card pull anywhere close to the theoretical 2 GB/sec possible over PCI-e x8. The 24-drive crazy Samsung RAID video that's floating around required three different RAIDs going in parallel to hit 2 GB/sec.

What people also need to realize is that high end RAID cards were built around a theory of using a large cache and a dedicated processor to handle XOR calculations for RAID-5 and 6. Even the best performing cards will, at best, perform on-par with a high IOPS SSD like an X25 series.

The parent article also speaks briefly of Native Command Queuing, hinting that it is not implemented in RAID cards. This is flat out wrong:

1. Only very high end cards properly implement NCQ at the host and drive level (i.e. Areca):
http://www.pcper.com/article.php?aid=695&type=expert&pid=6 [pcper.com]

2. Only some SSDs implement NCQ beyond a queue depth of about 4 (i.e. Intel).
http://www.pcper.com/article.php?aid=750&type=expert&pid=8 [pcper.com]

The *real* reason even the best RAID hardware does not scale properly with SSD usage is the fact that a good RAID card has an upper IOPS limit matching just *one* SSD. Adding more SSDs only increases throughput, and it takes roughly half the number of SSDs to saturate a given controller (as compared to using HDDs).

The parent article heavily confuses 'streaming' with 'IOPS'. A given RAID card can 'stream' just as well with either HDDs or SSDs. Where 'IOPS' comes into the equation is how far your average throughput drops as those requests become more random in nature. Random accesses cause the RAID controller to have to juggle more data. Here is an example: Placing an X25-M G2 behind an Areca RAID card will result in a *reduction* in IOPS, but no change in sequential throughput. The RAID card processor simply can't juggle the commands as fast as if that same X25-M G2 was connected to the motherboard controller directly. With a single SSD outmaneuvering the RAID controller, adding more SSDs only helps the RAID scale in sequential throughput, not IOPS.

For SSDs to behave properly behind a RAID, the entire RAID process needs to be rethought. You don't need a bunch of writeback cache and a bulky controller architecture. You need a very lightweight XOR engine with *no* cache. The best example of this is creating a RAID of SSDs on an Intel ICH-10R controller. IOPS scales beautifully. 3 or 4 X25s on an ICH-10R will even outmaneuver an ioDrive, and gives several times the IOPS performance of any RAID card.

Allyn Malventano
Storage Editor, PC Perspective

Re:Not quite (1, Informative)

Anonymous Coward | more than 5 years ago | (#28821979)

First a quick clarification: Intel X25 series SSDs do not use their RAM as a data writeback cache. Intel ships racks full of both M and E series drives, with those drives living in a RAID configuration. They couldn't pull that off if the array was corrupted on power loss.

While it would be nice if this were true, since Intel's FAQ [intel.com] references a write cache and database-oriented tests like the one I referenced show data corruption, the paranoid (which includes everyone who works on database and similar enterprise apps) have to presume there's still a problem until some trustworthy studies to the contrary appear. Please let me know if you're aware of any. Your argument of "they couldn't pull that off" is not a data point, because millions of hard drives with a lying write cache are shipped every year to people who think they're just fine, and who don't experience corruption on power loss. Those same drives show corruption just fine if you do a database-oriented corruption test on them.

Until I see SSD vendors giving very clear statements about their write caching and they start passing tests specifically aimed at discovering this type of corruption, you have to assume that the situation with them is just as bad as it's always been with regular IDE or SATA disks--drives lie. The only such test I've seen so far using the Intel drives is from Vadim, the X25-E failed. It would be great if the coverage you were doing at PC Perspective, expanded to cover this issue fully; write-cache enabled? [jasonbrome.com] , diskchecker.pl [livejournal.com] , and faking the sync [livejournal.com] have good introductions to this issue and how to run such tests yourself.

All wrong. (2, Informative)

sirwired (27582) | more than 5 years ago | (#28819797)

1) Most high-end RAID controllers aren't used for file serving. They are used to serve databases. Changes in filesystem technology don't affect them one bit, as most of the storage allocation decisions are made by the database.
2) Assuming that a SSD controller that can pump 55k IOPS w/ 512B I/O's can do the same w/ 4K I/O's is stupid and probably wrong. That is Cringely math; could this guy possibly be as lame?
3) The databases high-end RAID arrays get mostly used for do not now, and never have, used much bandwidth. They aren't going to magically do so just because the underlying disks (which the front-end server never even sees) can now handle more IOPS.

All SSD's do is flip the Capacity/IOPS equation on the back end. Before, you ran out of drive IOPS before ran out of capacity. Now, you get to run out of capacity before you run out of IOPS on the drive side.

Even if you have sufficient capacity (due to the rapid increase in SSD capacity), you are still going to run out of IOPS capacity on the RAID controller before you run out of IOPS or bandwidth on the drives. The RAID controller still has a lot of work to do with each I/O, and that isn't going to change just because the back-end drives are now more capable.

SirWired

Re:All wrong. (1)

jon3k (691256) | more than 5 years ago | (#28820439)

"All SSD's do is flip the Capacity/IOPS equation on the back end. Before, you ran out of drive IOPS before ran out of capacity. Now, you get to run out of capacity before you run out of IOPS on the drive side."

Thank you so much for summarizing that point so succinctly, I'm stealing that line, hope you don't mind :)

Re:All wrong. (2, Interesting)

AllynM (600515) | more than 5 years ago | (#28821315)

Well said. I've found using an ICH-10R kills that overhead, and I have seen excellent IOPS scaling with SSDs right on the motherboard controller. I've hit over 190k IOPS (single sector random read) with queue depth at 32, using only 4 X25-M G1 units. The only catch is the ICH-10R maxes out at about 650-700 MB/sec on throughput.

Allyn Malventano
Storage Editor, PC Perspective

STOP USING SSDS AS HDS (-1, Flamebait)

Anonymous Coward | more than 5 years ago | (#28819825)

The solution is simple.

Do not use SSDs as HDs. Use them as replacements for media storage. They were never meant to be HD replacements.

The amount of data written to them at an OS level is ridiculous, let alone added overhead for RAID. The limited write cycles will kick in much sooner than needed, beginning the deterioration of the disk and making it die much earlier than ANY HD's average lifespan.

I swear, do people even THINK anymore?

AC because of the torrent of morons that can't think that believe I'm wrong. You're wrong, and you're an idiot. Deal with it.

No matter how fat you make the pipe (1)

countertrolling (1585477) | more than 5 years ago | (#28820105)

Somebody will find a way to clog it up.

Where's our "paperless" society?

The next bottleneck? (1)

Sjefsmurf (1414991) | more than 5 years ago | (#28820255)

Nothing new here.

Anyone seriously into benchmarking or high performance applications would know that raid controllers has been a bigger bottleneck than the harddrives for ages already.

It's just the last 2-3 years or so that you have gotten raid controllers fast enough to properly deal with the performance of the 6 tp 8 15k rpm drives that a normal 2U server can hold, and still today, many of the server raid cards out there still cannot do this.

Raid card performance has easily been the biggest differentiator on server performance for anyone that needed a reasonable amount of I/O capacity on their servers. Most servers have been reasonably equal in terms of memory and network performance. After all, they are all built around a very limited number of CPU and chipset architectures and there is only so much that can different there and it is a long time since gigabit network HW for server did not manage to fill a gigabit link.

Raid cards on the other side has major differences in architecture, software and processors. Proper HW raids are basically small computers on a card. They got their own CPU, memory and OS. This isolate them from the host they are plugged into and protect the data even if the host crash (the raid card will normally not crash and has all data in the battery backed up cache, which means a great deal for critical data and massively reduces the chances that you need to do consistency checks/validation and rebuilds after a crash on the host server which is equally important for a server).

Unfortunately as a result of all that extra complexity, you also get the potential for large performance variations between different raid cards.

When that is said, a good quality raid card now definately help on performance in most scenarios and easily outperforms software raids for most server usage that includes a reasonable amount of writing as long as you got that battery backup on your cache so you can safely enable write back caching.

Just do your homework and make sure you get a good card when you shop. The better cards can easily be 2-3x faster than the worst.

Re:The next bottleneck? (1)

PiSkyHi (1049584) | more than 5 years ago | (#28820723)

If you do a cost comparison, the software RAID beats Hardware, because instead of buying a fast RAID card, you've got more storage space and speed.

SSD killed the Raid(io) star (1)

Latinhypercube (935707) | more than 5 years ago | (#28820399)

SSD killed the Raid(io) star. Really, who needs the fuss of raid. Unless it's for backup, there is no need for raid as far as speed goes. SSD are already bottlenecking the 3.0Gb/s SATA II. A single SSD can produce the same throughput as 4 raided Raptors (=fast drives). Plus anyone can install and SSD into an existing setup, Raid requires a lot of reinstalls and drivers etc..

Re:SSD killed the Raid(io) star (1)

Rockoon (1252108) | more than 5 years ago | (#28821045)

The 3Gb/sec rating is on each individual port, not on the bridge between the SATA controller(s) and the CPU. THe bridge between the controller and CPU can theoretically max out the system bus (We measure that in GB/sec instead of Gb/sec.) There are plenty of SATA controllers that push well over 3Gb/sec towards the system, it is only that each individual SATA300 device is capped at 3Gb/sec.

Re:SSD killed the Raid(io) star (0)

Anonymous Coward | more than 5 years ago | (#28821133)

... 4 raided Raptors (=fast drives)...

Raptors? Fast? In which universe?

Go back to your basement and play with your toy computers. We're talking real hardware here, son.

Colo Datacenters VS Real Datacenters (0)

Anonymous Coward | more than 5 years ago | (#28820617)

Not sure what Datacenters you have been visiting but it sounds like you need to get out more. In a standard Colocation datacenter you see a lot of data that lives in midgrade x86 server raid subsystems. You also see a lot of bakers racks filled with crap white box systems.

In a real datacenter the only raid seen is a raid 1 for the boot drives to get the server up into the operating system. The data lives on the SAN. If the server suffers a hardware failure or other problem the admin is able to assign the LUN to another server and get the application back up during the repairs. Clustered applications are even able to do this on their own and page the admin and let him know its time for a service call.

And of course you mention nothing about ZFS which is even able to judge the read and write speed of its devices. A raidz configuration of a mix between regular spindles and ssd's would be able to balance between the two depending on the needs of the operations involved.

SSD's won't be in the datacenter for a long time. The 15k rpm fibre channel drives found in most EMC hardware is robust and scary fast on top of being extremely fault tolerant with BCV's and multiple LUN's. I wonder how well SSD's would do in a real world test of multiple LUN configuration with 24 hour hammering on the other end of 5000 hosts on an 8gb fibre?

The Real Answer... (1)

billybob_jcv (967047) | more than 5 years ago | (#28821185)

...Our EMC sales rep has been putting the hard sell on us to buy some SSD product. I think they are worried about their profit margins on conventional drives, and they want to move customers to a product with a higher margin - and along the way they can also try to get you to upgrade head units, etc.

It's all a bit moot... (0)

Anonymous Coward | more than 5 years ago | (#28822605)

This is a pretty simplistic view. As a senior storage engineer, I have conversations like this quite often. RAID controller hardware, at the enterprise storage level, are not articles of hardware that figure in things. In addition, and perhaps more pertinently, there is a reasonable chance that in the next few years the RAID paradigm may pass into history and that disk interface models that incorporate linear power/throughput growth in enterprise storage subsystems such as IBM's XIV will take over. It's certainly a quantum improvement in thinking, at least. It will also deal with all of these smug statistical analyses that talk about RAID rebuild times growing (in line with spindle size growth) such that second disk failures prior to the rebuild of an original disk failure taking out an entire array.

data center (0)

Anonymous Coward | more than 5 years ago | (#28823305)

currently network bw is the limit in the data centers. i dont see this changing anytime soon.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?