Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Does ZFS Obsolete Expensive NAS/SANs?

kdawson posted more than 7 years ago | from the fast-secure-reliable-cheap dept.

Data Storage 578

hoggoth writes "As a common everyman who needs big, fast, reliable storage without a big budget, I have been following a number of emerging technologies and I think they have finally become usable in combination. Specifically, it appears to me that I can put together the little brother of a $50,000 NAS/SAN solution for under $3,000. Storage experts: please tell me why this is or isn't feasible." Read on for the details of this cheap storage solution.

Get a CoolerMaster Stacker enclosure like this one (just the hardware not the software) that can hold up to 12 SATA drives. Install OpenSolaris and create ZFS pools with RAID-Z for redundancy. Export some pools with Samba for use as a NAS. Export some pools with iSCSI for use as a SAN. Run it over Gigabit Ethernet. Fast, secure, reliable, easy to administer, and cheap. Usable from Windows, Mac, and Linux. As a bonus ZFS let's me create daily or hourly snapshots at almost no cost in disk space or time.

Total cost: 1.4 Terabytes: $2,000. 7.7 Terabytes: $4,200 (Just the cost of the enclosure and the drives). That's an order of magnitude less expensive than other solutions.

Add redundant power supplies, NIC cards, SATA cards, etc as your needs require.

Sorry! There are no comments related to the filter you selected.

ZFS (5, Informative)

Anonymous Coward | more than 7 years ago | (#19319859)

Also should be noted that FreeBSD has added ZFS support to Current (v7). It's built on top of GEOM too so if you know what that is you can leverage that underneath zfs.

Re:ZFS (4, Interesting)

ggendel (1061214) | more than 7 years ago | (#19319871)

I think you'll a bit high. I put together a 5-500Gb Sata II disk setup with Raid-Z in a 5 disk enclosure for under $1000. I run it off my Sunfire v20z. That's 2 TBs for under 1k USD!

Re:ZFS (0)

Anonymous Coward | more than 7 years ago | (#19320079)

The best part is that when FreeNAS startus using the FreeBSD v7.0 CORE... everyone will be able to utilize it. Not that _I_ like FreeNAS; but people on /., like people in general, can't handle the console. Preferably I run BSD and Solaris barebones, but FreeNAS is probably a good idea for those who can unly use a mouse.

Have you tried out Starfish? (3, Informative)

msporny (653636) | more than 7 years ago | (#19320409)

Ever heard of Starfish? It's a new distributed clustered file system:

Starfish Distributed Filesystem [digitalbazaar.com]

From the website:

Starfish is a highly-available, fully decentralized, clustered storage file system. It provides a distributed POSIX-compliant storage device that can be mounted like any other drive under Linux or Mac OS X. The resulting fault-tolerant storage network can store files and directories, like a normal file system - but unlike a normal file system, it can handle multiple catastrophic disk and machine failures.

And you can build clusters at relatively low cost:

For a 2-way redundant, RAID-1 protected, 1.0 Terabyte cluster: $2,000 (Jan 2007 prices). Per server, that breaks down into around $400 for a AMD 2.6Ghz CPU, 1GB of memory, and a motherboard with integrated 100 megabit LAN connection, SATA support, 350 watt power supply and a commodity server enclosure. Four SATA 500GB hard drives will run you around $600. The cluster would ensure proper file system operation even in the catastrophic failure of a single machine. Hard drive failure rates could even approach 50% without affecting the Starfish file system.

(warning: I work for the company that created Starfish)

-- manu

Re:Have you tried out Starfish? (1)

hirschma (187820) | more than 7 years ago | (#19320565)

Why is Starfish better than pNFS [pnfs.com] ? How much does the software cost?

Specifics please. (5, Insightful)

PowerEdge (648673) | more than 7 years ago | (#19319863)

Not enough specifics here. I am going to say do your thing. If it works, you're a hero and saved 47k. If it doesn't obfuscate and negotiate the 50k of storage down to 47k. Win for all.

Unless you would like to give more specifics. Cause I am going to say in 99% of cases where you want fast, reliable, and cheap storage you only get to pick two.

Re:Specifics please. (0)

Anonymous Coward | more than 7 years ago | (#19319937)

fast, reliable, and cheap storage you only get to pick two.

Well, in this case they went with cheap consumer-grade hardware, meaning that the drives will probably not be the part most likely to fail.

Re:Specifics please. (4, Insightful)

tgatliff (311583) | more than 7 years ago | (#19320141)

"comsumer-grade hardware"???

Do you honestly believe the slogan of "business-grade"? Come on, let the marketing jargon go. Hardware designs are expensive, so rarely are there multiple designs. Sales guys are selling you additional support, but the hardware is rarely different. If it is, then the volume is not there, so the reliability is actually worse. Volume is the king of reliability. Reliability is always more dependent on the age of the design and its volume rather than the intended customer...

Eat my goatse'd penis! (-1, Troll)

Anonymous Coward | more than 7 years ago | (#19319945)

Re:Specifics please. (5, Insightful)

Ngarrang (1023425) | more than 7 years ago | (#19320343)

Unless you would like to give more specifics. Cause I am going to say in 99% of cases where you want fast, reliable, and cheap storage you only get to pick two.

I disagree completely. Computer hardware is a commodity. The big box makers are afraid of this very kind of configuration which would blow them out of business if more people caught on to it. No, they use FUD to convince PHBs that because of the low cost, it cannot possibly be as good. Hot-swap and hot-spare are commodity technologies. But, please, feel free to continue the FUD, because it helps the bottom line.

Everyman? (1, Redundant)

iknownuttin (1099999) | more than 7 years ago | (#19319879)

As a common everyman who needs big, fast, reliable storage without a big budget, ...

Porn jokes aside, what in the World does a common "everyman" need with that kind of storage?

I have a 40 gig OEM drive on this machine that I've had since 2003, and I still haven't approached the half way mark. And I run a couple of businesses.

Re:Everyman? (3, Insightful)

apathy maybe (922212) | more than 7 years ago | (#19319941)

Porn jokes indeed aside. I may not be an "everyman", but I think I'm close enough. My desire for storage (though not yet in the terrabyte range) comes from my photography (no not porn...). I take a bunch of pictures, and well, because storage is cheap I leave them all at the original file size (which in this case is about 2-5 MB depending).

I don't have a proper video camera, but I'm sure that people who do, have even bigger storage requirements.

Not only that, what with all the music you can copy of a friends HD now, your storage just jumps a bit more! (I've got literally more then 10 gigabytes of music on my desktop HD. And I know people who have hundreds of CDs, so if they ripped all those, they would have much more...)

Added to all those movies you can either rip or download...
Chuck in a decent network, family and/or friends, and you can now stream all this stuff around to wherever you want it.

I'd say then, that the most common use of all this space, multimedia. Not sure who has terrabytes of multimedia though.

Re:Everyman? (3, Insightful)

Baddas (243852) | more than 7 years ago | (#19320041)

150GB mp3s
80GB DVDs
120GB games
14GB/hr for DV editing
1 whole drive for OSes
RAID-5ed (1 parity drive)

So I'm up to 4 200gb drives right now, without even trying hard.

Soon I'm going to jump to 500GB drives, and I expect to be hitting their limits in a year or so.

Also, how the hell am I supposed to back up all this?! Incrementals would be 10gb+ / week

Re:Everyman? (1)

miller701 (525024) | more than 7 years ago | (#19320199)

150 GB of MP3s? That's on the order of 3,600 albums.

Re:Everyman? (-1)

Anonymous Coward | more than 7 years ago | (#19320253)

Yes, and?

Re:Everyman? (2, Interesting)

d3ac0n (715594) | more than 7 years ago | (#19320337)

Two words: High Bitrate.

If you like your music to actually SOUND good, 128kbps sucks. I personally rip my music using a Variable bitrate between 224 and 320 kbps. Unfortunately, this makes for VERY large files. But my music sounds FANTASTIC!

Re:Everyman? (0, Flamebait)

miller701 (525024) | more than 7 years ago | (#19320583)

I ripped most of my collection back in the OS9 pre-iPod iTunes days. Now that I have a kid and a house to take care of, I don't think I'll be re-ripping unless the computer dies. No judgement, just different priorities for me now. Peace

Re:Everyman? (1)

bkr1_2k (237627) | more than 7 years ago | (#19320447)

I don't know about you, but I have about 700 CDs ripped to about 50GB. That's about 2100 albums, which really isn't that much these days. Especially for music collectors. I don't even have music that is hugely dynamic, for the most part, so my compression is reasonably good with VBR set with minimum of 190 or something like that. For music that has a lot of dynamic sound bitrates are going to be higher. 150 GB is nowhere near 3600 albums if you actually want the music to sound close to correct.

Re:Everyman? (1)

mulvane (692631) | more than 7 years ago | (#19320247)

I can easily see the need for mass storage in the HOME with multimedia and other types of storage. Sure the web makes things easy to acquire, but that is still never gonna compare to instant over a home network. At home, I have an 8TB movie server that's using all 200GB drives. The drive enclosure is custom built and is using multiple 5 DC power supply bricks and also has SATAII Raid 5 capable expanders to create the first level of raid5. From there I go to a 8 port SATAII card and create 2 RAID 5 arrays using 2 4 port groups. At that point, the whole thing is brought into a raid0. The ONLY thing this is used for is storage of video (movies/tv ep's). I also have a 4TB video capture server with 6 TV capture cards. Each capture card can be controlled from a remote media center pc for live stream and recording at same time. I also have 2 2TB file servers with an archive of ISO's from quite literally every CD that has ever entered my possesion (minus AOL) since 1998 and various other things of personal business like pictures, and video's I shoot of the family. I have a couple 500GB machines floating around as servers for other things, but those don't count for anything important here. In conclusion to the article, I have built a fast, reliabe, and easy to manage solution with samba with web frontends to manage all the file serving needs I have. I look forward to the day I can ebay all these 200GB drives though after I replace with 750GB-1TB in the near future.

Terabyte (1)

palladiate (1018086) | more than 7 years ago | (#19320149)

I have a terabyte RAID in my main computer, and a 2 TB fileserver. With my video editing, I'm starting to look at a full fileserver (still have most of my main box empty). And, I'm not even a pirate, and I've just been a computer and video hobbyist for 2 decades now. I'm not even that serious about my hobbies.

I can imagine we'll all need terabytes of capacity in the next few years. Some games are already into the 14-20GB sizes.

Ah, digital photography. (1)

iknownuttin (1099999) | more than 7 years ago | (#19320241)

My desire for storage (though not yet in the terrabyte range) comes from my photography (no not porn...). I take a bunch of pictures, and well, because storage is cheap I leave them all at the original file size (which in this case is about 2-5 MB depending).

Ah yes, digital photography. It's a good thing I asked because I'm in the (gradual) process of moving away from film. Which means, I'll be having a similar problem as yourself. If you do it, I hope you post your results. I will be really interested!

BTW, a 5 MB photo file sounds very small - even if it's jpg. I'm assuming you made a typo and your storing a 2 - 5 megapixel image, which would be, what, at least 15 megabytes? Even more reason for the setup you're talking about!

Re:Ah, digital photography. (2, Interesting)

liquidpele (663430) | more than 7 years ago | (#19320513)

Be sure you back all your pictures up in two different drives if you're going digital. Had a friend who the photographer's computer ate most of her wedding pictures. There was a lawsuit and it wasn't pretty.

Re:Everyman? (1)

WhoBeDaPlaya (984958) | more than 7 years ago | (#19320383)

Just beefed up the 'ol media server / HTPC with 8x 750GB Seagates (got great deals on 'em). Holds vid caps, FLACs (~18K songs), family photos and other misc vids (game / movie trailers, etc.) Still have room for another 6 drives.

Re:Everyman? (1)

eldepeche (854916) | more than 7 years ago | (#19319961)

His business probably needs hundreds of DVD rips and 30000 mp3s.

Re:Everyman? (1, Funny)

Anonymous Coward | more than 7 years ago | (#19320175)

Arrrr, matey! That's a scurrrrilous rumor, ye salty sea dog!

Re:Everyman? (1)

theStorminMormon (883615) | more than 7 years ago | (#19320029)

I have a 40 gig OEM drive on this machine that I've had since 2003, and I still haven't approached the half way mark.

And you're obviously not storing any substantial media. I have more then 40GB just of MP3s.

Re:Everyman? (2, Insightful)

simong (32944) | more than 7 years ago | (#19320087)

You're not trying hard enough ;)

I've got just over a terabyte of live storage around the house and I probably use about half of it - I have a couple of hundred gigs of video and about 60 gigs of music. I know of someone who is currently buying seven of Hitachi's new terabyte HDs for an in-home video streaming system, There's always someone who has a use for it.

Re:Everyman? (4, Informative)

Max von H. (19283) | more than 7 years ago | (#19320095)

I'm a photographer and my RAW image files are 15MB each. At every shooting, I come back with 1 to 8GB worth of data to be processed. My workflow involves working on 16-bit TIFFs that weigh in excess of 40MB/file and I'm not even counting the photoshop work files. 40GB would last less than a week here.

Not being rich, I have a couple of external HDs totalling a little less than 1TB, and it's nearly full. The rest is archived on DVD or transfered to HD for storage (cheaper, faster and more reliable than DVD).

So yeah, I can easily imagine why any organisation dealing with huge media files would be interested. Heck, I'd be a client for a safe, multi-TB storage system if I could afford it... Not everybody only deals with text files for a living :P

Re:Everyman? (2, Funny)

iknownuttin (1099999) | more than 7 years ago | (#19320539)

Not everybody only deals with text files for a living

Well, I'll have to buy a digital camera that shoots in ASCII. Oh wait.......

Re:Everyman? (2, Interesting)

Znork (31774) | more than 7 years ago | (#19320435)

"what in the World does a common "everyman" need with that kind of storage?"

Consolidate your multimedia and run MythTV for a while. Once you rip and encode several TV series, all your DVD films, and have the Myth recording your favourite shows, a terabyte doesnt seem that much. If you want an idea for future examples of massive storage consumptions, imagine having MythTV recording all channels all the time, so you'd basically be able to decide post-transmission what you want to view and save...

Of course, while I agree most NAS and SAN solutions are grotesquely overpriced and mainly useful for separating fools from their money, I cant really see why one would bring up ZFS and OpenSolaris for this purpose. Something like Openfiler [openfiler.com] would be vastly more appropriate, proven and easy to manage.

Re:Everyman? (1)

kannibul (534777) | more than 7 years ago | (#19320507)

Maybe not everyman, but...

With the recent changes to the laws for business, having to retain data and versions of documents for YEARS has become a reality (otherwise, one can be found liable of "virtual shredding". With that, the need for cheap, fast, and large storage is a must.

Re:Everyman? (3, Interesting)

Mr Z (6791) | more than 7 years ago | (#19320569)

I actually have two 48GB databases full of minimal instruction sequences for generating boolean functions. Do I win the obscure use of disk space prize?

Congradulations, you discovered the "File Server" (4, Informative)

BigBuckHunter (722855) | more than 7 years ago | (#19319881)

For quite a while now, it has been less expensive to build a DIY file server then to purchase NAS equipment. I personally build gateway/NAS products using Via c7/8 boards as they are low power, have hardware encryption, and are easy to work with under linux. Accessory companies even make back plane drive cages for this purpose that fit nicely into commodity PCs.

BBH

Re:Congradulations, you discovered the "File Serve (1)

tokul (682258) | more than 7 years ago | (#19320183)

For quite a while now, it has been less expensive to build a DIY file server then to purchase NAS equipment.
Depends on needed storage space. If you need more than 1-3 TB, you can't use generic components, price goes up and hardware starts taking more space than dedicated NAS box. Or Tyan 2U-4U boxes are DIY in your country.

Re:Congradulations, you discovered the "File Serve (1)

Wdomburg (141264) | more than 7 years ago | (#19320433)

Depends on what you mean by "generic" components. There are plenty of generic servers cases out there designed for storage applications. A 2U rack enclosure with 12 drive bays can be had for around a grand. Add a twelve port raid controller for six or seven hundred and you've got capacity to expand to 6TB raw even if you stick to 500GB drives (currently best price per GB).

ok for low end, not for high (3, Informative)

alen (225700) | more than 7 years ago | (#19319885)

place where i work looked at one of these things from another company. did the math and it's too slow even over gigabit for database and exchange server. OK for regular file storage, but not for heavy I/O needs

Re:ok for low end, not for high (4, Insightful)

morgan_greywolf (835522) | more than 7 years ago | (#19320121)

Precisely. The question in the title is a little bit like asking "Will large PC clusters obsolete mainframes?" or "Will Web applications obsolete traditional GUI applications?" The answer is, as always, "It depends on what you use it for." For high-performance databases or a high-traffic Exchange server, these things may not work well.

I've seen plenty iSCSI of solutions coupled with NAS servers that get pretty good throughput in this price range that are already integrated and ready to go, but the bottom line is that if you want high-peformance, high-availability storage for I/O-intensive applications, you need a fiber SAN/NAS solution.

Re:ok for low end, not for high (4, Insightful)

Jim Hall (2985) | more than 7 years ago | (#19320205)

I agree. At my work, we have a SAN ... low-end frames (SATA) to mid-range (FC+SATA) to high-end frames (FC.) We put a front-end on the low-end and mid-range storage using a NAS, so you can still access using the storage fabric or over IP delivery. Having a SAN was a good idea for us, as it allowed us to centralize our storage provisioning.

I'm familiar with ZFS and the many cool features laid out in this Ask Slashdot. The simple answer is: ZFS isn't a good fit to replace expensive SAN/NASs. However, ZFS on a good server with good storage might be a way to replace an inexpensive SAN/NAS. Depending on your definition of "inexpensive." And if you don't mind the server being your single point of failure.

And more importantly. (4, Funny)

Spazntwich (208070) | more than 7 years ago | (#19319891)

Does the overuse of TLAs obfuscate the meaning of SDS?

Current issues (4, Informative)

packetmon (977047) | more than 7 years ago | (#19319903)

I've snipped out the worst reasons as per Wiki entry:

  • A file "fsync" will commit to disk all pending modifications on the filesystem. That is, an "fsync" on a file will flush out all deferred (cached) operations to the filesystem (not the pool) in which the file is located. This can make some fsync() slow when running alongside a workload which writes a lot of data to filesystem cache.
  • ZFS encourages creation of many filesystems inside the pool (for example, for quota control), but importing a pool with thousands of filesystems is a slow operation (can take minutes).
  • ZFS filesystem on-the-fly compression/decompression is single-threaded. So, only one CPU per zpool is used.
  • ZFS eats a lot of CPU when doing small writes (for example, a single byte). There are two root causes, currently being solved: a) Translating from znode to dnode is slower than necessary because ZFS doesn't use translation information it already has, and b) Current partial-block update code is very inefficient.
  • ZFS Copy-on-Write operation can degrade on-disk file layout (file fragmentation) when files are modified, decreasing performance.
  • ZFS blocksize is configurable per filesystem, currently 128KB by default. If your workload reads/writes data in fixed sizes (blocks), for example a database, you should (manually) configure ZFS blocksize equal to the application blocksize, for better performance and to conserve cache memory and disk bandwidth.
  • ZFS only offlines a faulty harddisk if it can't be opened. Read/write errors or slow/timeouted operations are not currently used in the faulty/spare logic.
  • When listing ZFS space usage, the "used" column only shows non-shared usage. So if some of your data is shared (for example, between snapshots), you don't know how much is there. You don't know, for example, which snapshot deletion would give you more free space.
  • Current ZFS compression/decompression code is very fast, but the compression ratio is not comparable to gzip or similar algorithms.

Re:Current issues (1)

ZorinLynx (31751) | more than 7 years ago | (#19319967)

ZFS does not do user quotas. If you want to do user quotas you need to create a filesystem per user. Filesystems are easy to create but a filesystem per user gets cumbersome if you have thousands of users, not to mention having to have thousands of NFS exports and making backups a greater headache.

Really, Sun, you gotta fix this. At least give users a choice as to what to use, seperate filesystems or user quotas (or a combination of both)

vs Reiser4 (someday, maybe) (5, Informative)

SanityInAnarchy (655584) | more than 7 years ago | (#19320273)

Some of these issues looked familiar, so I thought I'd do a basic comparison:

Reiser4 had the same problems with fsync -- basically, fsync called sync. This was because their sync is actually a damned good idea -- wait till you have to (memory pressure, sync call, whatever), then shove the entire tree that you're about to write as far left as it can go before writing. This meant awesome small-file performance -- as long as you have enough RAM, it's like working off a ramdisk, and when you flush, it packs them just about as tightly as you can with a filesystem. It also meant far less fragmentation -- allocate-on-flush, like XFS, but given a gig or two of RAM, a flush wasn't often.

The downside: Packing files that tightly is going to fragment more in the long run. This is why it's common practice for defragmenters to insert "air holes". Also, the complexity of the sync process is probably why fsync sucked so much. (I wouldn't mind so much if it was smarter -- maybe sync a single file, but add any small files to make sure you fill up a block -- but syncing EVERYTHING was a mistake, or just plain lazy.) Worse, it causes reliability problems -- unless you sync (or fsync), you have no idea if your data will be written now, or two hours from now, or never (given enough RAM).

(ZFS probably isn't as bad, given it's probably much easier to slice your storage up into smaller filesystems, one per task. But it's a valid gotcha -- without knowing that, I'd have just thrown most things into the same huge filesystem.)

There's another problem with reliability: Basically, every fast journalling filesystem nowadays is going to do out-of-order write operations. Entirely too many hacks depend on ordered writes (ext3 default, I think) for reliability, because they use a simple scheme for file updating: Write to a new temporary file, then rename it on top of the old file. The problem is, with out-of-order writes, it could do the rename before writing the data, giving you a corrupt temporary file in place of the "real" one, and no way to go back, even if the rename is atomic. The only way to get around this with traditional UNIX semantics is to stick to ordered writes, or do an fsync before each rename, killing performance.

I think the POSIX filesystem API is too simplistic and low-level to do this properly. On ordered filesystems, tempfile-then-rename does the Right Thing -- either everything gets written to disk properly, or not enough to hurt anything. Renames are generally atomic on journalled filesystems, so either you have the new file there after a crash, or you simply delete the tempfile. And there's no need to sync, especially if you're doing hundreds or thousands of these at once, as part of some larger operation. Often, it's not like this is crucial data that you need to be flushed out to disk RIGHT NOW, you just need to make sure that when it does get flushed, it's in the right order. You can do a sync call after the last of them is done.

Problem is, there are tons of other write operations for which it makes a lot of sense to reorder things. In fact, some disks do that on a hardware level, intentionally -- nvidia calls it "native command queuing". Using "ordered mode" is just another hack, and its drawback is slowing down absolutely every operation just so the critical ones will work. But so many are critical, when you think about it -- doesn't vim use the same trick?

What's needed is a transaction API -- yet another good idea that was planned for someday, maybe, in Reiser4. After all, single filesystem-metadata-level operations are generally guaranteed atomic, so I would guess most filesystems are able to handle complex transactions -- we just need a way for the program to specify it.

The fragmentation issue I see as a simple tradeoff: Packing stuff tightly saves you space and gives you performance, but increases fragmentation. Running a defragger (or "repacker") every once in awhile would have been nice. Problem is, they never got one written. Common UNIX (and Mac) philosophy is that a defragger isn't needed when your filesystem isn't as likely to fragment, but I disagree -- just try downloading a few hundred gigs via BitTorrent, and watch any filesystem start to suffer. I gave up Reiser4 for XFS, because my Reiser4 was becoming corrupt enough to need a fsck on a regular basis. It wasn't losing much, but occasionally I'd lose a random binary here and there -- nothing I couldn't reinstall, but it scared the living crap out of me to be losing random files.

But then, XFS is out-of-order by default, so maybe I should just give up on expecting an FS to survive a power outage unscathed and invest in a UPS.

ZFS supports copy-on-write, and let's face it: If you implement it efficiently, you're going to have fragmentation. Implement it inefficiently, and you're going to waste a TON of space. Imagine you have a 10 gig file, and you write 1 gig in the middle of it -- if it's a copy-on-write file, you either have to leave 9 gigs of the file where it was, and relocate 1 gig -- fragmenting one copy -- or you have to copy the entire file, wasting 9 gigs of space. You can see similar tradeoffs all over the place. The only way around this, I would think, is to run a defragger of some sort, and base it on actual usage patterns (not theoretical). And you still have a tradeoff -- space vs speed, although even that isn't clear-cut, as with a smart enough cache, and assuming both copies are used often, being able to cache the identical parts only once would help.

It sounds like ZFS also supports something like lzo compression. And yes, the compression ratio is not comparable to gzip, but on most hardware, even if you weren't needing the CPU for anything else, I would guess that gzip compression is slower than the disk. Reiser4's plan here was to allow multiple "compression plugins", meaning you could specify for a file (or group of files, or whole tree, or filesystem) which compression algorithm you wanted. Default was lzo, but you could also use gzip, and probably even bzip2, eventually. The theory was, on most hardware, lzo compression is at least as fast as writing it straight to disk, and therefore faster if it saves you any space at all -- but sometimes, you prefer to save more space and have it perform worse, so you'd want gzip. (There it is again -- space vs performance.)

Then again, if it is lzo, you have to have some damned fast storage before it being single-threaded even starts to become an issue. I'd be surprised if you could fill that one CPU -- but even if you just about could, that means you have the other CPUs free for any real work you're doing. That's not an excuse to make it single-threaded if it doesn't have to be, but I don't see it as a practical concern. (But maybe I don't know; I don't work on multi-terabyte arrays.)

The rest of your points are probably valid, and sound like bugs (not flaws inherent to the design, as, for example, copy-on-write fragmentation or out-of-order writes causing corruption).

ZFS FUD YAY (1)

toby (759) | more than 7 years ago | (#19320295)

It's a pity your summary fails to emphasise: Most of these are known bugs; some are likely already fixed; and a couple are simply wrong.

But it's all beside the point. ZFS still offers data integrity that no other hardware or software system does (some other systems do provide copy on write, pool-like manageability, and so on). One day we'll all be using it (or a clone).

Re:Current issues (1)

allenw (33234) | more than 7 years ago | (#19320503)

I'm surprised your list doesn't include the inability to evacuate a drive. This is one of the biggest problems with ZFS right now. Makes it a real pain to upgrade drives.

Real SANs do more (4, Informative)

PIPBoy3000 (619296) | more than 7 years ago | (#19319909)

For starters, our SAN uses extremely fast connectivity. It sounds like you're moving your disk I/O over the network, which is a fairly significant bottleneck (even Gb). We also have the flexibility of multiple tiers - 1st tier being expensive, fast disks, and 2nd tier being cheaper IDE drives. I imagine you can fake that a variety of ways, but it's built in. Finally, there's the enclosure itself, with redundant power and such.

Still, I bet you could do what you want on the cheap. Being in health care, response time and availability really are life-and-death, but many other industries don't need to spend the extra. Best of luck.

Its just not the same thing. (4, Informative)

Tester (591) | more than 7 years ago | (#19319913)

A good 20k$ RAID array does much more. First, it doesn't use cheap SATA drives, but Fiberchannel Drivers or even SAS drives which are tested to a higher level of quality (each disk costs like 500$ or more..). And those cheap SATA drives also react much more poorly to non-sequential access (like when you have multiple users). They are unusable for serious file serving. You can never compare RAID arrays that use SATA/IDE to ones that use enterprise drives like FC/SCSI/etc, because the drives are quite different.

Then you have the other features like dual redundant everything: controllers, power supplies, etc. Then you have thermal capabilities of rack-mount solutions that often are different from SATA, etc, etc.

Re:Its just not the same thing. (5, Informative)

ZorinLynx (31751) | more than 7 years ago | (#19320017)

These overpriced drives aren't all that much different from SATA drives. They're a bit faster, but a HELL of a lot more expensive, and not worth paying more than double per gig.

We have a Sun X4500 which uses 48 500GB SATA drives and ZFS to produce about 20TB of redundant storage. The performance we have seen from this machine is amazing. We're talking hundreds of gigabytes per second and no noticeable stalling on concurrent accesses.

Google has found that SATA drives don't fail noticeably more often than SAS/SCSI drives, but even if they did, having several hot spares means it doesn't matter that much.

SATA is a great disk standard. You get a lot more bang for your buck overall.

Re:Its just not the same thing. (2, Insightful)

Lumpy (12016) | more than 7 years ago | (#19320365)

Show me how you build a Raid 50 of 32 sata or ide drives.

also show me a SINGLE sata or ide drive that can touch the data io rates of a u320 scsi drive with 15K spindle speeds.

Low end consumer drive cant do the high end stuff. Dont even try to convince anyone of this. guess what, those uses are not anywher near strange for big companies. witha giant SQL db you want... no you NEED the fastest drives you can get your hands on and that is SCSI or Fiberchannel.

Re:Its just not the same thing. (2, Insightful)

RedHat Rocky (94208) | more than 7 years ago | (#19320553)

Comparing drive to drive, I agree with you; 15k wins.

However, the price point on 15k drives is such a comparison for a single drive vs multiple drives is reasonable. The basis is $$/GB, not drive A vs drive B.

Ask Google how they get their throughput on their terrabyte datasets. Hint: it's not due to 15k drives.

Re:Its just not the same thing. (2, Interesting)

drsmithy (35869) | more than 7 years ago | (#19320599)

Show me how you build a Raid 50 of 32 sata or ide drives.

Get yourself a nice big rackmount case and some 8 or 16 port SATA controllers.

also show me a SINGLE sata or ide drive that can touch the data io rates of a u320 scsi drive with 15K spindle speeds.

Of course, the number of single drives you can buy for the cost of that single 15k drive will likely make a reasonable showing...

Low end consumer drive cant do the high end stuff. Dont even try to convince anyone of this. guess what, those uses are not anywher near strange for big companies. witha giant SQL db you want... no you NEED the fastest drives you can get your hands on and that is SCSI or Fiberchannel.

This is technically true, however, "low end consumer" drives can certain beat the performance of what *used* to be "high end" until relatively recently, and are quite adequate for significant proportions of the market.

Individually, 7.2k SATA drives are quite a bit slower than 15k SAS/FC drives. But get a dozen or two of those SATA spindles in an array, and you've got some (relative to cost) serious performance.

Re:Its just not the same thing. (1)

Draconian (70486) | more than 7 years ago | (#19320613)

48 disks and hundreds of GB/s ? That leads to over 2 GB/s per disk. A good disk gives about 100MB/s sustained, or less. Come to think of it, memory speeds are rarely that fast, I think only the fastests graphics cards come to 100GB/s.

Re:Its just not the same thing. (2, Insightful)

Firethorn (177587) | more than 7 years ago | (#19320047)

Sure, a good $20k NAS RAID does more. Question is, is it really needed?

I mean, you could deploy 10 times as many of these as you could your array, giving me 10 times the storage.

Depending on what you do, it has to hit the network sometime. For example - we have a big expensive SAN solution. What's it used for? As a massive shared drive.

For a fraction the price we could of put a similarly sized one of these in every organization. Sure, it wouldn't be able to serve quite as much data, but most of our stuff isn't accessed extremely often anyways.

I've noted before that in many cases it'd be cheaper for us to backup to IDE hard drives than to tape. The tape alone costs more per megabyte than a HD, and has slower transfer rates to boot. Our backup solution could be several of these, connected by fiber to another building(so we don't have to move them).

Re:Its just not the same thing. (5, Informative)

tgatliff (311583) | more than 7 years ago | (#19320073)

It is not my intention to offend, but I alway love it when I hear the dreaded marketing phrase of hardware "tested to a higher level of quality".

I work in the world of hardware manufacturing, and I can tell you that this "magical" more testing process simply does not exist. Hardware failures are always expensive, and we do anything we can to prevent them. To do this, we build burn in procedures based on what most call the 90% rule, but you really cannot guarantee more reliability beyond that. Better device design at that point is what will determine reliability beyond that point. Any person who says differently either does not completely understand individual test harness processes or does not understand how burn in procedures work.

In short, more money is not nessesarily better. More volume designs typically are, though...

At $20k? (0)

Anonymous Coward | more than 7 years ago | (#19320101)

It uses SATA drives (assuming it's big enough to be called an array, rather than being five disks shoved in a 1U box). If you want FC or SAS, you're looking at $50K on up -- probably up.

Re:Its just not the same thing. (1)

llZENll (545605) | more than 7 years ago | (#19320113)

I guess you missed the Google disk report stating that expensive fiber drives have the same reliability as SATA and IDE drives. The only benefit of a fiber drive in reality is they tend to have a higher RPM which translates to more IOPS, and even that can be had with the SATA Raptor. There is little reason to use fiber other than wasting your IT departments money and falsely inflating someones ego.

Re:Its just not the same thing. (0)

Anonymous Coward | more than 7 years ago | (#19320349)

> which are tested to a higher level of quality

Oh gawd!

Can I interest you in a bridge? It's a famous historical landmark I'm selling on behalf of a wealthy client and I'm prepared to cut you a great deal.

Re:Its just not the same thing. (1, Insightful)

terminal.dk (102718) | more than 7 years ago | (#19320443)

You did not read the reports out recently on drive reliability.

SCSI vs IDE is no issue. Newer technology drives are better than older drives is a bigger factor. So SATA lasts as long as SCSI. Performance wise for sequential access, a 500GB 7200RPM disk beats a 10000 RPM 72GB or 144 GB SCSI any day. For my laptop, my 160GB 5400 RPM disk is faster than the 7200 RPM 100GB disk.

The advantage of the SCSI disk is native command queuing. So that it can stop on the way from sector A to C and read sector B. But, this is also implemented in SATA-300 drives, so this advantage is also gone from SCSI.

I have been a big SCSI fanboy myself, but the magic has gone. SATA has flown past SCSI.

But what drives do you use? (1)

Puchku (615680) | more than 7 years ago | (#19319917)

Do you use consumer level drives, or enterprise level drives? You have not specified that. The cost varies.

Re:But what drives do you use? (1)

bemenaker (852000) | more than 7 years ago | (#19320059)

Use consumer grade drives. The google drive failure report, basically shows no difference between them.

Re:But what drives do you use? (1)

Sancho (17056) | more than 7 years ago | (#19320103)

Had to be consumer level, since the drive cage he wrote about is $1300 by itself.

123 Incorporate ! (1)

bytesex (112972) | more than 7 years ago | (#19319949)

Well I must say it's true: a company is born on /. every minute.

Re:123 Incorporate ! (0)

Anonymous Coward | more than 7 years ago | (#19320085)

Reminds me of hearing this yesterday [npr.org] .

No (5, Informative)

iamredjazz (672892) | more than 7 years ago | (#19319955)

Speaking from personal experience - This file system is far from ready. It can kernel panic and reboot after minor IO errors, we were hosed by it, and probably won't ever revisit it. This phenomenon can be repeated with a usb device, you might want to try it before you hype it. Try a google search on it and see what you think...there is no fsck or repair, once it's hosed, it's hosed, the recovery is to go to tape. http://www.google.com/search?hl=en&q=zfs+io+error+ kernel+panic&btnG=Google+Search [google.com]

Re:No (0)

Anonymous Coward | more than 7 years ago | (#19320069)

Plus, how long do you think an fsck would take on a really large ZFS?


I don't fully understand the lust people have for a filesystem. Sure 128 is bigger than 64bit but in all but the most extreme situations, even 64bit is practically future proof.

I don't know, I've been doing this a really long time and only a couple of times have I ever noticed a great difference changing filesystems. Seems like you shouldn't even notice any more, it's just a slightly different configuration process.

That's the idea (1)

complete loony (663508) | more than 7 years ago | (#19319957)

ZFS has a lot of potential. However the current implementation of ZFS has its limits, and you should know what they are before you commit to maintaining a server running it.

It depends (0)

Anonymous Coward | more than 7 years ago | (#19319959)

I'd say this highly depends on your usage of the NAS/SAN. I, for example, have setup a FreeBSD NAS (Samba Share) of a Software Raid-3 configuration. For the most part this is mainly a storage setup (write-once, read often). This coupled with my Gb network works well enough for me. It's fast enough that I'm able to burn dvds over the network and look at files in real time (and watch videos without any lag). If you're looking to implement a high volume of reads/writes using your setup, you should look into using a hardware based raid configuration (Raidz is nice and all but the parity is still calculated in software which can take up more cycles than necessary). I see absolutely no point in buying a pre-made NAS/SAN as all they've done is pre-set everything up.

Again, determine your situation, if it's mainly storage, than your raidz1 setup should be fine. But if you're going for very high I/O I would recommend using a hardware-based raid setup. In either case Gb ethernet should be fast enough (make sure you use a cat 6 and not a cat 5e)

Reliable? (4, Informative)

Jjeff1 (636051) | more than 7 years ago | (#19319979)

Businesses buy SANs to consolidate storage, placing all their eggs in one basket. They need redundant everything, which this doesn't have. Additionally, SATA drives are not as reliable long term as SCSI. Compare the data sheets for Seagate drives, they don't even mention MTBF on the SATA sheet [seagate.com] .
Businesses also want service and support. They want the system to phone home when a drive starts getting errors, so a tech shows up at their door with a new drive before they even notice there are problems. They want to have highly trained tech support available 24/7 and parts available within 4 hours for as long as they own the SAN.
Finally, the performance of this solution almost certainly pales as compared to a real SAN. These are all things that a home grown solution doesn't offer. Saving 47K on a SAN is great, unless it breaks 3 years from now and your company is out of business 3 days waiting for a replacement motherboard off Ebay.
That being said, everything has a cost associated with it. If management is ok with saving actual money in the short term by giving up long term reliability and performance, then go for it. But by all means, get a rep from EMC or HP in so the decision makers completely understand what they're buying.

Re:Reliable? (1)

Wdomburg (141264) | more than 7 years ago | (#19320035)

Not that I think that home grown storage is necessarily a good fit for
To be fair, Seagate does list MTBF if you look at the data sheet [seagate.com] for SATA drives actually sold for enterprise applications.

I'll agree, however, that home grown solutions are only approrpriate for a limited number of applications where outlay costs are more important that reliability and support.

Re:Reliable? (0)

Anonymous Coward | more than 7 years ago | (#19320171)

They want the system to phone home when a drive starts getting errors, so a tech shows up at their door with a new drive before they even notice there are problems.
We had a SAN at where I work phone home to the vendor telling it that drives were failing. The support guy went out to the datacenter, went into out cage, and saw that the roof was leaking onto the SAN unit. Apparently the moisture was killing the drives off. Who would have thought that water was bad for powered on electronics?

Re:Reliable? (2, Insightful)

Ruie (30480) | more than 7 years ago | (#19320407)

A few points:
  • Reliability: if your solution costs $3K instead of $47K, just buy 2 or three. Nothing beats system redundancy, provided the components are decent.
  • Saving money - in 3 years just buy a new system. You will need more storage anyway.
  • SAS versus SATA: the only company that I know of that makes 300G 15K rpm drives is Seagate. They cost $1000 a piece. Compare to $129-$169 per Western Digital 400G enterprise drives. For multi-terabyte arrays it makes a lot of sense to get SATA disks and put the money into RAM. Lots of RAM - like 32GB or 64GB.
  • The big advantage of a Linux system (or BSD if you know it better) is flexibility. Configure SAMBA they way you like, mount shares from other machines the way you like. Use chattr, setfacl, or whatever else. Run database on the same machine the RAID controller is on. Use multiple ethernet adaptors.
  • To people above - if bandwidth is a problem, 10Gb adapter now costs $1000. Though I doubt your RAID controller can saturate even one gigabit line under moderate seek load.

Support (1)

vaderhelmet (591186) | more than 7 years ago | (#19319983)

In a standard corporate environment where the thought of spending $50,000 has come to the plate, you're probably in deeper than you think. A lot of what you pay for with a big name SAN from EMC or the like is that you're getting serious support and reliable equipment from a well established company. Your homebrew method will almost undoubtedly work... but all equipment fails, and when this solution fails, unless you're picky about getting manufacturer warrantees, then you're going to be losing money to fix the problem. With our EMC/Dell solution, if something goes wrong we'll have a tech with a replacement on site within 4 hours. If your thing fails, can you get an RMA replacement in even under a week?

I'm not trying to discourage you from building this system, in fact I think DIY is a great way to go. However, you do need to take into account how downtime will affect the cost of this device. It is always important to have a failover/replacement plan for when your system goes down because most systems DO go down. (Which is why many of us are even employed.)

Good luck to you, sir!

Re:Support (1)

multipartmixed (163409) | more than 7 years ago | (#19320151)

> I'm not trying to discourage you from building this system, in fact
> I think DIY is a great way to go. However, you do need to take into
> account how downtime will affect the cost of this device.

We do some roll-your-own stuff at my company. The downtime and spares issue is solved simply by stocking spares *ahead of time*. For example, if I am needing three RAID enclosures for a project, I buy four and leave the fourth one on the shelf. If we need to dip into the spares, THEN we can start shopping for replacements.

We rarely use manufacturer's warranties, simply because it is such a pain. Although, I was dismayed once to call up Seagate with 48 bad disks, only to find out that their warranty had expired the month before. Guess I should've been more proactive..

Re:Support (1)

19thNervousBreakdown (768619) | more than 7 years ago | (#19320225)

With the amount of money that's saved, he could have a complete offline backup system. Go to SCSI and he could multi-host two systems, and have a third to implement an active/passive cluster, and it's STILL less than 1/4 of the cost.

Re:Support (0)

Anonymous Coward | more than 7 years ago | (#19320261)

I agree - a inexpensive Dell/EMC solutions is by far the best way to go. We have 25PetaBytes of storage from various vendors (HDS/Compaq/IBM/NetApps/EMC) - and EMC's service and quality has won us over - so that now we are only buying EMC.

At that price he can keep spares (1)

HighOrbit (631451) | more than 7 years ago | (#19320275)

You have a good point about the support with higher value equipment, but at this price he can afford to keep a few spares in the closet, or even have a few other complete units as a failover 'live' backup.

No but... (3, Informative)

Junta (36770) | more than 7 years ago | (#19319989)

ZFS does not obsolete NAS/SAN. However, for many many many instances, DIY fileservers have been more appropriate than SAN or NAS situations for many concepts long before ZFS came along, and ZFS has done little to change that situation (though adminning ZFS is more straightfoward and in some ways more efficient than the traditional, disparate strategies to achieve the same thing).

I haven't gotten the point of standalone NAS boxes. They always were not fundamentally different from a traditional server, but with a premium price attached. I may not have seen the high-end stuff, howerver.

SAN is an entirely different situation all together. You could have ZFS implemented on top of a SAN-backed block device (though I don't know if ZFS has any provisions to make this desirable). SAN is about solid performance to a number of nodes with extreme availability in mind. Most of the time in a SAN, every hard drive would be a member of a RAID, with each drive having two paths to power and to two RAID controllers in the chassis, each RAID controller having two uplinks to either two hosts or two FC switches, and each host either having two uplinks to the two different controllers or to two FC switches. Obviously, this gets pricey for good reason which may or may not be applicable to your purposes (frequently not), but the point of most SAN situations is no single-point of failure. For simple operation of multiple nodes on a common block device, HA is used to decide which single node owns/mounts any FS at a given time. Other times, a SAN filesystem like GPFS is used to mount the block device concurrenlty among many nodes, for active-active behavior.

For the common case of 'decently' availble storage, a robust server with RAID arrays has for a long time been more appropriate for the majority of uses.

Home-grown storage solutions (1)

GPSguy (62002) | more than 7 years ago | (#19319997)

It's been a couple of years since we built our first multi-terabyte storate (~1.6TB for ~$5k US) and started the grand experiment with XFS. Of all we've learned, I think the biggest lesson has been that XFS has problems in this realm. We are now spinning over 30TB and likely to double soon. To SAN or not to SAN has become the question, as finding our data and metadata are both important to us, more important than slinging the files off and accidentally discovering them later.

Some form of indexing system is valuable, especially if you're looking at multiple volumes to span.

We're now looking at lustre and possibly zfs to support our solutions. For hardware, we've gone to ATA-over-Ethernet with CoRAID hardware and been pretty satisfied. It's not iSCSI but it works well and we get adequate transfer rates. We do a caching process when we're anticipating high user access periods and can predict the patterns.

I'd say, for the money, try it, benchmark it, and report back.

NAS and SAN are two very different technologies wi (0)

Anonymous Coward | more than 7 years ago | (#19320019)

NAS and SAN are two very different technologies with different goals.

iSCSI is a block level protocol that requires an iSCSI soft initiator or iSCSI HBA ToE for other computers to access it. It is a similar but slow, less secure, and cheaper solution than a normal fibre channel SAN.

To do it right you really should have an iSCSI HBA ToE in the target and initiator, as well dedicated router that is made to handle iSCSI - because of its inherent "bursty" nature allot of routers choke on it.

NAS is file level transfer that uses NFS and CIFS which are already built into every OS

Just did the same.. (0, Redundant)

Padrino121 (320846) | more than 7 years ago | (#19320023)

I have experience with a number NAS solutions and if cost wasn't or reliability/throughput was paramount I would continue to purchase them (e.g., Netapp). Depending on the environment they are being installed in the (perceived) liability and additional complexity can be challenging to overcome.

With that said for places where rolling your own is an option I would keep your eye out for a good deal on drives and you will be able to build one much less expensive. I put together a new Myth backend with the following:

Antec Sonata II - $65 (rebate)
Asus M2N32-Vista addition (it's running Liux but the vista addition has an LIRC supported IR receiver) - $210
AMD 4200+ X2 - $96
2GB RAM - $55
Nvidia 7600 with HDMI out - $110
6 x 500GB Maxtor SATA II HDDs - $600

It's not RAID-Z but with a standard RAID-5 I have 2.5TB usable storage with HDTV output and ATA/iSCSI targets for $1136. Not bad and Linux SW RAID-5 write speed actually screams these days, with this setup I expect 200MB write throughput.

One word of caution with RAID-Z, although writes are extremely fast there is a performance issue around reads if they are small and random because there will be a lot of cache misses. Relatively speaking it's not that bad but something to kep in mind when looking at the workload you will be supporting.

DIY has always been an economical option (1)

dogsbreath (730413) | more than 7 years ago | (#19320139)

There is nothing new here; it has always been cheaper to work out your own storage solution than to buy a commercial unit. Same goes for servers/desktops etc. Why buy a Sun (or Dell, or IBM or whatever) when you can get your local guy to put a system together for half the price?

If you are an individual or even a small business with limited capital then DIY is often the best (only?) way to go but you also get to deal with flakey controllers, incompatible drivers, and warranty returns all on your own. The integration of components, performance management, and the harmony of the complete system is all yours.

At some point, either because of the scale or the criticality of the system, it is worth the bucks to pay someone who has researched the issues and built a solid product to provide you with a solution that you can (hopefully) trust. Your sysadmins and techies can spend their time on ROI generating projects instead of figuring out why a component does wild and whacky crap every full moon. Tech support can be a very good thing.

Even open source has its commercial providers. Personally, I have always liked Slackware but if we are deploying servers it's going to be Red Hat.

I think homebrew is super: put your system together and do some benchmarking, then publish it for the rest of us to benefit!

ZFS is great, but... (4, Informative)

Etherized (1038092) | more than 7 years ago | (#19320147)

It's no NetApp - yet. One thing to realize is that iSCSI target isn't even in Solaris proper yet - you have to run Solaris Express or OpenSolaris for the functionality. That may be fine for some people, but it's a deal-breaker for most companies - you're really going to place all those TB of data on a system that's basically unsupported? I'm sure Sun would lend you a hand for enough money, but running essentially a pre-release version of Solaris is a non-starter where real business is concerned. Even when iSCSI target makes it into Solaris 10 - which should be in the next release - are you really comfortable running critical services off of essentially the first release of the technology? Furthermore, while ZFS is amazingly simple to manage in comparison to any other UNIX filesystem/volume manager, it still requires you to know how to properly administer a Solaris box in order to use it. Even GUI-centric sysadmins are generally able to muddle through the interface on a Filer, but ZFS comes with a full-fledged OS that requires proper maintenance. Your Windows admins may be fine with a NetApp - especially with all that marvelous support you get from them - but ask them to maintain a Solaris box and you're asking for trouble. Not to mention, since it's a real, general purpose server OS, you'll have to maintain patches just like you do on the rest of your servers - and the supported method for patching Solaris is *still* to drop to single user mode and reboot afterwards (yes, I know that's not necessarily *required*). Also, "zfs send" is no real replacement for snapmirrors. And while ZFS snapshots are functionally equivalent to NetApp snapshots, there is no method for automatic creation and management of them - it's up to the admin to create any snapshotting scheme you want to implement. Don't get me wrong - I love ZFS and I use it wherever it makes sense to do so. It may even be acceptable as a "poor man's Filer" right now, assuming you don't need iSCSI or any of the more advanced features of a NetApp. In fact, it's a really great solution for home or small office fileservers, where you just need a bunch of network storage on the cheap - assuming, of course, that you already have a Solaris sysadmin at your home or small office. Just don't fool yourself, Filer it ain't - at least not yet.

hard drives will be that big soon (1)

192939495969798999 (58312) | more than 7 years ago | (#19320165)

based on progression of drive sizes, A 1.3+ terabyte drive should be available within a few years, and a 7.0+TB drive shortly thereafter. If you want cheap storage, just use large single drives. They are super cheap and while not super large, they are far less complicated than this setup.

Either pay a lot, or roll your own (1)

siddesu (698447) | more than 7 years ago | (#19320181)

Whether you buy or build depends a lot on whom you're buying from. Buying from people who are not in the storage business, even if it is a big corporation like Dell, gives you about the same level of support as rolling your own when the shit hits the fan. Don't believe me? See this one, and notice how long the problem kept on and on (me is one of the happy users there):

http://www.dellcommunity.com/supportforums/board/m essage?board.id=pv_raid&message.id=214&view=by_dat e_ascending&page=1 [dellcommunity.com]

Buying from EMC or the likes (even EMC from Dell) tends to work better. The hardware is expensive, the consulting fee is expensive, and the support is expensive, but at least for that kind of money you are sure someone tries a bit harder to help.

All in all, it depends on your business. If you are making a zillion a month from that hardware working flawlessly, _not_ paying $200k for the storage is dumb. If you are making little enough so that $50k makes you think about it, rolling your own could be the way to go.

Re:Either pay a lot, or roll your own (0)

Anonymous Coward | more than 7 years ago | (#19320239)

Wow, thats f-ing relevant....4 year old stuff. Pssh.

This hardly depends on ZFS... (4, Informative)

Wdomburg (141264) | more than 7 years ago | (#19320187)

This doesn't strike me as having much to do with ZFS at all. You've been able to do a home grown NAS / SAN box for years on the cheap using commodity equipment. Take ZFS out of the picture and you just need to use a hardware raid controller or a block level RAID (like dmraid on Linux or geom on FreeBSD). There are even canned solutions for this, like OpenFiler [openfiler.com] .

That being said, this sort of solution may or may not be appropriate, depending on site needs. Sometimes support is worth it.

You're also grossly overestimating the cost of an entry-level iSCSI SAN solution. Even going with EMC, hardly the cheapest of vendors, you can pick up a 6TB solution for about $15k, not $50k. Go with a second tier vendor and you can cut that number in half.

It simply will not scale (1)

peterpressure (940132) | more than 7 years ago | (#19320211)

How many high bandwidth clients could you possibly get off of a single 125MB/sec connection (gigE).

This simply will not scale properly in a corporate environment. Add dozens of gigE, Fibre or Infiniband connectors, and perhaps we can talk about hosting up some serious bandwidth to hundreds of clients.

Good start though...

Redundancy (1)

fredr1k (946815) | more than 7 years ago | (#19320243)

How is it with redundancy? you got redundant PSU, redundant controlers? redundant Network-paths to and from the server. Obviously the tech itself have some LARGE advantages. But working with enterprise technology makes you think redundancy*3. (No one wants the SQL-cluster to fail due to a failed PSU that turned out to be a single point of failure.)

Replace NAS? Sure. SAN? No way. (3, Informative)

pyite (140350) | more than 7 years ago | (#19320285)

I guess this setup could replace some people's need for a turnkey NAS solution. But your thinking it could replace SAN solutions shows you haven't looked into SAN too much. To start, there's a reason Fibre Channel is way more popular than iSCSI. The financial services company I work for has about 3 petabytes of SAN storage, and not a drop of it is iSCSI. Storage Area Networks are special built for a purpose. They typically have multiple fabrics for redundancy, special purpose hardware (we use Cisco Andiamo, i.e., the 9500 series), and a special purpose layer 2 protocol (Fibre Channel). iSCSI adds the overhead of TCP/IP. TCP does a really nice job of making sure you don't drop packets, i.e. layer 3 chunks of data, but at the expense of possibly dropping frames, i.e. layer 2 data. The nature of TCP just does this, as it basically ramps up data sending until it breaks, then slows down, rinse and repeat. This also has the effect of increasing latency. Sometimes this is okay, people use FCIP (Fibre Channel over IP), for example. But, sometimes it's not. Fibre Channel does not drop frames. In addition, Fibre Channel supports cool things like SRDF [wikipedia.org] which can provide atomic writes in two physically separate arrays. (We have arrays 100 km away from each other that get written basically simultaneously and the host doesn't think its write is good until both arrays have written it.) So, like I said, this might be good for some uses, but not for any sort of significant SAN deployment.

Cheap, redundant, and performant storage. (4, Interesting)

Lethyos (408045) | more than 7 years ago | (#19320297)

Google have a great solution [storagemojo.com] that focuses on the “cheap” part without compromising much the latter two. If you have not read up on the Google Filesystem, definitely take the time to. At the very least, it seems to call into question the need to shell out tens of thousands for high-end storage solutions that promise reliability in proportion to the dollar.

Re:Cheap, redundant, and performant storage. (3, Informative)

TheSunborn (68004) | more than 7 years ago | (#19320579)

But Google Filesystem is not available for buying which is a shame.

And hiring a team to develop something similary to google filesystem is not cheep. Even highend sans will be cheeper.

EMC Dell solution (1)

Leadmagnet (685892) | more than 7 years ago | (#19320309)

I would go with the Dell EMC AX150i SP Array for an iSCSI solution of that size - it can do iSCSI up to 6TB.

No (2, Insightful)

drsmithy (35869) | more than 7 years ago | (#19320389)

Potentially it will obselete low-end NAS/SAN hardware (eg: Dell/EMC AX150i, StoreVault S500) in the next couple of years, for companies who are prepared to expend the additional people time in rolling their own and managing it (a not insignificant cost - easily making up $thousands or more a year). There's a lot of value in being able to take an array out of a box, plug it in, go to a web interface and click a few buttons, then forget it exists.

However, your DIY project isn't going to come close to the performance, reliability and scalability of even an off the shelf mid-range SAN/NAS device using FC drives, multiple redundant controllers and power supplies - even if the front end is iSCSI.

Not to mention the manageability and support aspects. When you're in a position to drop $50k on a storage solution, you're in a position to be losing major money when something breaks, which is where that 24x7x2hr support contract comes into play, and hunting around on forums or running down to the corner store for some hardware components just isn't an option.

ZFS also still has some reliability aspects to work out - eg: hot spares. Plus there isn't a non-OpenSolaris release that offers iSCSI target support yet AFAIK.

I've looked into this sort of thing myself, for both home and work - and while it's quite sufficient for my needs at home, IMHO it needs 1 - 2 years to mature before it's going to be a serious alternative in the low-end NAS space.

CORAID and ATA over Ethernet (1)

backtick (2376) | more than 7 years ago | (#19320393)

Buy one of CORAID's 1521 disk shelves w/ their CLN20 front end for $6600 and drop in 15 500 GB SATA drives (they're a whopping $100 each these days) for a quick 7TB of raw storage for ~$8K or ~9K. Need more storage? Go w/ 750GB drives (They're validating the 1 TB raw drives now, but the price isn't worth it, per GB). Want to add storage later? Buy another 1521 and plug it in. Oh, and it's AoE, with less overhead than iSCSI.

Why not good ol' trusted Linux? (5, Informative)

Britz (170620) | more than 7 years ago | (#19320399)

Linux has more perfomance testing on x86 than OpenSolaris (so you are not as likely to run into a bad bottleneck). On Linux you can create a RAID-1,-4,-5 and -6 under Multiple Device Driver Support in the kernel. You can then use mkraid to include all the drives you want. This code in not new at all. It was stable in 2.4, maybe even in 2.2

After that you just create a filesystem on top of the raid. If you don't like ext3 or don't trust it, there is always xfs. I had some rough times with reiserfs, xfs, and ext3 and for all the experience I had I would go xfs for long running server environments (and now get flamed for this little bit, use ext3 all you want).

The advantage is that you use very well tested code.

The problem comes with hotswapping. I don't know if the drivers are up to that yet. But I also highly doubt that OpenSolaris SATA drivers for some low price chip in a low price storage box can deal with hotswapping. So Linux might be faster on that one.

That is a setup I would compare to a plug'n play SAN solution. And it totally depends on the environment. If the Linux box goes down for some reason for a couple hours/days, how much will that cost you? If it is more than twice the SAN-solution, you might just buy the SAN and if it fails just pull the disks and put them in the new one. I dunno if that would work on Linux.

Hardware RAID (1)

ehfortin (678151) | more than 7 years ago | (#19320463)

I was reading the various comments and I was surprised that nobody suggested a hardware raid ctrl. There is some from well known business (Adaptec, promise and so on) that will support from 4, 8 and more SATA drives and do RAID 1/5/1+0. They usually support hot spare, raid level change, volume growth, etc. These are available from about 300$ with a median for 8 drives at about 500$. Using this kind of solution would be faster and more robust (make sure you are taking a full RAID chipset, not just a RAID accelerator where a combination of software and hardware is necessary) and should be easier to manage as well then a zfs setup over OpenSolaris. These are also often available for multiple OS. I'm looking at this kind of solution for my personal LAN. I didn't had time yet to order the hardware and the disks so I have no experience doing it but on paper, it look good. At this point I'm looking for an Adaptec 2820SA which support 8 drives and offer a lot of interesting features. Does anybody have comments on taking that road or on the Adaptec 2x20SA particularly?

FUD alert! (0)

Anonymous Coward | more than 7 years ago | (#19320517)

Once upon a time a new outside broadcast truck was being built. The new boss made changes to the design to use domestic televisions instead of $5000 D1 monitors. Guess what? The teevee sets fell apart and the boss became laughing stock. The proper monitors had to be put in, at great expense.
Computer gear is a bit different. Six years ago a 100Gb real-time disk array cost $100000. A cheaper 100Gb of storage could then have been built with a Pentium PC and four 33Gb IDE Drives, raided together to set you back $3000 or so. A spare could have been built for another $3000 and the systems rsync-ed on a daily basis. Perhaps new disks could have been bought in the years between, maybe another $3000 worth. Assuming the 'linux' box did the job and delivered the 1's and 0's quick enough, what would you prefer, $90000 in the bank or a very expensive and a not so powerful real-time disk?
As it worked out we paid for the $100000 box. It did mess up a few times, even needing new drives. It also sounded like a rocket, not indicative of being energy efficient. Not one client gave a damn about what we were using for storage, however, we did have to get them to pay lots of money for such toys and we did have to manage the financial risk.
Looking back I think the homebrew solution could have been more fun, perhaps to give a business edge and an opportunity to gain useful skills instead of niche knowledge.
Clearly the solution has to suit the application, and in this case some budget could be spent on cache RAM, hopefully to provide a really interesting setup. If things are DB intensive then some of the indexes could stay in the cache-RAM, maybe to give better performance than with the $50000 SAN and with no application re-writes.

This will work (1)

gweihir (88907) | more than 7 years ago | (#19320523)

I have one Linux fileserver with 5TBs for some time now, the only issue ever was a dead PSU. Fixed that by replacing Fortron with Enermax.

Only thing I would do differently now is to use PCI-E SATA controllers for bandwith (my server has PCI). Linux software RAID is perfectly up to the task. ZFS should also be.

One thing: Do temperature and SMART moniroting on all drives and run along SMART selftest every two weeks sor so. Also have the RAID and SMART monitoring software send Email on problems and have at least one spare disk ready to use.

Another option (1)

doubledjd (1043210) | more than 7 years ago | (#19320545)

I'm not that well versed in this stuff so make sure to do your own research. (the disclaimer is now out of the way).
You can't compare zfs to SAN. I didn't need a san but look at nas. Netapp is great but still had a few limitations.
  • The discs were proprietary.
  • redundancy. To go with the redundant head was to bump up to a new level in cost.
  • chassis size was set
  • the cost ran up fast for the software to do the things I wanted. all proprietary (which was alright considering)
Still, netapp is quite a quality product.
I wanted a cheaper solution that had more to it so I looked at coraid.
  • They use an AoE (ata over ethernet) protocol that is lighter than tcp/ip.
  • It can run with off-the-shelf sata drives
  • a gateway can be used if you want to add more shelves
  • they have redundant gateway solution as well
We'll be putting it in this quarter.
Having said this, I'm unsure of the state of lvm2 for block level snapshots. This was one thing that netapp did well. Also, be selective on the drives you choose. This is only another option you can look into.

Openfiler.com (0)

Anonymous Coward | more than 7 years ago | (#19320615)

Take a look at this solution: www.openfiler.com. Supports CIFS, NFS, FTP, iSCSI, and WebDAV protocols. Provides snapshots, quotas, and active directory integration. It can use builtin storage or act as a gateway to existing fibre channel or iSCSI storage. Plus, you can receive commercial support for it.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?