Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Which OSS Clustered Filesystem Should I Use?

Unknown Lamer posted more than 2 years ago | from the deleting-is-so-90s dept.

Open Source 320

Dishwasha writes "For over a decade I have had arrays of 10-20 disks providing larger than normal storage at home. I have suffered twice through complete loss of data once due to accidentally not re-enabling the notification on my hardware RAID and having an array power supply fail and the RAID controller was unable to recover half of the entire array. Now, I run RAID-10 manually verifying that each mirrored pair is properly distributed across each enclosure. I would like to upgrade the hardware but am currently severely tied to the current RAID hardware and would like to take a more hardware agnostic approach by utilizing a cluster filesystem. I currently have 8TB of data (16TB raw storage) and am very paranoid about data loss. My research has yielded 3 possible solutions: Luster, GlusterFS, and Ceph." Read on for the rest of Dishwasha's question."Lustre is well accepted and used in 7 of the top 10 supercomputers in the world, but it has been sullied by the buy-off of Sun to Oracle. Fortunately the creator seems to have Lustre back under control via his company Whamcloud, but I am still reticent to pick something once affiliated with Oracle and it also appears that the solution may be a bit more complex than I need. Right now I would like to reduce my hardware requirements to 2 servers total with an equal number of disks to serve as both filesystem cluster servers and KVM hosts."

"GlusterFS seems to be gaining a lot of momentum now having backing from Red Hat. It is much less complex and supports distributed replication and directly exporting volumes through CIFS, but doesn't quite have the same endorsement as Lustre."

"Ceph seems the smallest of the three projects, but has an interesting striping and replication block-level driver called Rados."

"I really would like a clustered filesystem with distributed, replicated, and striped capabilities. If possible, I would like to control the number of replications at a file level. The cluster filesystem should work well with hosting virtual machines in a high-available fashion thereby supporting guest migrations. And lastly it should require as minimal hardware as possible with the possibility of upgrading and scaling without taking down data."

"Has anybody here on Slashdot had any experience with one or more of these clustered file systems? Are there any bandwidth and/or latency comparisons between them? Has anyone experienced a failure and can share their experience with the ease of recovery? Does anyone have any recommendations and why?"

cancel ×

320 comments

Sorry! There are no comments related to the filter you selected.

The Cloud, obviously. (0, Funny)

Anonymous Coward | more than 2 years ago | (#37902878)

Wait a minute. I'm a manager, and I've been reading a lot of case studies and watching a lot of webcasts about The Cloud. Based on all of this glorious marketing literature, I, as a manager, have absolutely no reason to doubt the safety of any data put in The Cloud.

The case studies all use words like "secure", "MD5", "RSS feeds" and "encryption" to describe the security of The Cloud. I don't know about you, but that sounds damn secure to me! Some Clouds even use SSL and HTTP. That's rock solid in my book.

And don't forget that you have to use Web Services to access The Cloud. Nothing is more secure than SOA and Web Services, with the exception of perhaps SaaS. But I think that Cloud Services 2.0 will combine the tiers into an MVC-compliant stack that uses SaaS to increase the security and partitioning of the data.

My main concern isn't with the security of The Cloud, but rather with getting my Indian team to learn all about it so we can deploy some first-generation The Cloud applications and Web Services to provide the ultimate platform upon which we can layer our business intelligence and reporting, because there are still a few verticals that we need to leverage before we can move to The Cloud 2.0.

Re:The Cloud, obviously. (-1)

Anonymous Coward | more than 2 years ago | (#37902898)

"The cloud" is retarded. Just call it a redundant array of servers for fucks sakes.

Re:The Cloud, obviously. (0)

VJmes (2449518) | more than 2 years ago | (#37903660)

My father wants to start a new cloud storage company called blackhole.

At least they'd be honest about their business...

Re:The Cloud, obviously. (0)

GunFodder (208805) | more than 2 years ago | (#37903664)

Kudos for hitting virtually all of the important buzzwords! MVC is far too old for your usage, though; you should have used RESTful instead.

Re:The Cloud, obviously. (0)

Anonymous Coward | more than 2 years ago | (#37903730)

Yeah, a reference to noSQL of some sort would have been nice to round it out too.

Repeat after me: (5, Insightful)

Anonymous Coward | more than 2 years ago | (#37902892)

RAID is not a backup solution!

Re:Repeat after me: (5, Insightful)

NFN_NLN (633283) | more than 2 years ago | (#37903396)

Parent currently is marked as "0" but is dead on. His opening statement talks about a data loss (x2), is "very paranoid about data loss" and his closing remarks talk about "ease of recovery". Your statements suggest you are primarily concerned about data loss.

Clustered filesystems are complex software that specialize in concurrent server access, not increased redundancy.

You need to research backups and/or remote replication. Or buy an enterprise file server that does everything including call-home when it detects a hardware issue.. not waste time on a CFS.

Re:Repeat after me: (2)

NFN_NLN (633283) | more than 2 years ago | (#37903440)

And don't forget about RPO. If you want synchronous file replication over any useful distance we're talking $$$. If asynchronous is acceptable then decide what an acceptable RPO is, along with your data change rate. With those you can decide if you can afford offsite replication. Most business decide nightly tapes are acceptable at that point.

Re:Repeat after me: (0)

Anonymous Coward | more than 2 years ago | (#37903740)

Except when they do support redundancy:

http://www.gluster.com/community/documentation/index.php/Gluster_3.2:_Creating_Replicated_Volumes [gluster.com] - Replicated volumes replicate files throughout the bricks in the volume. You can use replicated volumes in environments where high-availability and high-reliability are critical.

http://www.gluster.com/community/documentation/index.php/Gluster_3.2:_Creating_Distributed_Striped_Volumes [gluster.com] - Distributed striped volumes stripe data across two or more nodes in the cluster. For best results, you should use distributed striped volumes where the requirement is to scale storage and in high concurrency environments accessing very large files is critical.

http://www.gluster.com/community/documentation/index.php/Gluster_3.2:_Creating_Distributed_Replicated_Volumes [gluster.com] - Distributes files across replicated bricks in the volume. You can use distributed replicated volumes in environments where the requirement is to scale storage and high-reliability is critical. Distributed replicated volumes also offer improved read performance in most environments.

Re:Repeat after me: (5, Insightful)

NFN_NLN (633283) | more than 2 years ago | (#37903816)

Except when they do support redundancy:

http://www.gluster.com/community/documentation/index.php/Gluster_3.2:_Creating_Replicated_Volumes [gluster.com] - Replicated volumes replicate files throughout the bricks in the volume. You can use replicated volumes in environments where high-availability and high-reliability are critical.

RAID is still NOT A BACKUP!

I have a 500 node replicated filesystem... and I just overwrote the wrong file, or a virus infected a file, or the file got corrupted...

The good news is my 500 replicated nodes are all consistent. The bad news is... wheres my fucking file!

Re:Repeat after me: (2)

afabbro (33948) | more than 2 years ago | (#37903828)

Clustered filesystems are complex software that specialize in concurrent server access, not increased redundancy.

Bingo. Spot on perfect answer.

Re:Repeat after me: (0)

Anonymous Coward | more than 2 years ago | (#37903466)

No kidding... I have 10 TB of RAID-5 disk on my file server.. .you know what I did? Spent a few hundred bucks on external USB Hard drives that I do periodic backups onto. I rarely have problems (in fact I can only think of one time I had a few corrupt files)... i restored those files from backup and everything was happy.

Re:Repeat after me: (1)

Enfixed (2423494) | more than 2 years ago | (#37903542)

Heh, just buy a bunch of 3TB My Book's on black Friday and call it a day. ;)

You Should... (0, Flamebait)

Anonymous Coward | more than 2 years ago | (#37902904)

Get a girlfriend.

Re:You Should... (2, Insightful)

RobDollar (1137885) | more than 2 years ago | (#37902968)

If ever, this article is the case for your comment. Dishwasha, what the living fuck are you doing with your life. Answer that and then maybe, just maybe, coherent answers will abound.

Re:You Should... (0)

Anonymous Coward | more than 2 years ago | (#37903056)

Home users with that much storage space never admit that it's packed full of the filthiest porn known to mankind. They'll always claim that it's used to store "high-def recordings from their home audio studio".

Re:You Should... (0)

Anonymous Coward | more than 2 years ago | (#37903112)

Actually, it's pirated movies, TV, games, and software.

Re:You Should... (0)

Anonymous Coward | more than 2 years ago | (#37902990)

With 8TB of porn, who needs a girlfriend?

Re:You Should... (2)

slaker (53818) | more than 2 years ago | (#37903230)

As someone with considerably more than 8TB of porn (and a similarly vast quantity of non-porn content, handily digitized and indexed), until recently I used paired servers each holding 12TB of drives in RAID6 with 2 drives as hot spares (64 physical drives on four machines). I used rsync to maintain a second copy of all my data. I've decided that's insane, and I've moved to using a single 36TB FreeBSD server (running zfs for my storage pools) that has enough internal expansion to accommodate another 36TB without getting into external expanders. I've paired that with an LTO4 changer that I bought off Craigslist for around $1900. At the moment I have just enough tapes to have two complete copies of my data. I'd like to get another hundred tapes so I can comfortably manage grandfather-father-son backups and have some spares in reserve.

I really don't have any confidence in common RAID with large arrays of large drives, since the possibility of a hard error during a rebuild or resync is too high for comfort. Large data sets really need to be mirrored and if at all possible stored in some offline fashion. That's really the only path to reliable storage.

Re:You Should... (2)

JWSmythe (446288) | more than 2 years ago | (#37903638)

    I have to ask, what the hell are you going to do with 8TB of porn? What's the total runtime of all of that?

    Consider, the whole Doctor Who series [slashdot.org] . 202GB is almost 11 days, 20 hours of runtime. Assuming roughly the same size, which may allow for higher resolution video with better compression, and rounding 202GB at 11 days 20 hrs down to 11 days (giving you bigger files per hour) you'd be looking at roughly 435 days.

    If you beat your meat for an hour a day, ever day, you'd have 10,455 days (or 28.6 years) of masturbation material.

    If you're just a perv, and for some reason like to have porn playing to enhance the ambiance of your home (which may be a bit funky with the aroma of semen and lube), assuming you sleep for 8 hours a day and don't need to have the porn playing while you sleep, you could leave it playing for every waking hour for 1.8 years before you ever watched the same smut twice.

    Based on those numbers, you haven't viewed all the videos to even ensure you downloaded what you think you did. You most likely you have a significant number of malware invested decoy videos. Well, unless you believe that all those DRM signups and codec suggestions are legitimate.

    So, based on this, why the hell do you need, or think you need, all that stuff and storage? There is a word for it. "hoarder". You should consider asking your shrink about disposophobia, and hypersexuality through masturbation. You can get help. It will save you a fortune in lube and unnecessary computer gear.

   

Re:You Should... (0)

Anonymous Coward | more than 2 years ago | (#37903824)

It's not about having porn exactly. It's about the fact that it was exceedingly easy to build a vast amount interesting content that I could use for a sort of meta-data tagging and sorting system I've been playing with on my own time for the last several years. Until very recently, adult material could be found in absolutely staggering quantities with zero possibility of legal repercussions. It's just not possible to do that for TV shows or movies without the added risk of legal complication.

I think of what I'm doing more from the standpoint of a librarian or curator; the collecting and organizing is as much if not more interesting than the content.

ReiserFS (0, Offtopic)

apparently (756613) | more than 2 years ago | (#37902966)

-- if it doesn't succeed in protecting your data, it'll make your wife die trying.

Re:ReiserFS (0, Funny)

Anonymous Coward | more than 2 years ago | (#37903014)

Hans Reiser is innocent. He was unjustly convicted by a corrupt court system manipulated by Microsoft.

Re:ReiserFS (3, Insightful)

KendyForTheState (686496) | more than 2 years ago | (#37903058)

Uh... he DID confess to the crime AND lead the cops to his wife's body. I know...sarcasm, right?

Re:ReiserFS (-1)

Anonymous Coward | more than 2 years ago | (#37903298)

He was railroaded into accepting a plea bargain, lest he wanted the death penalty for not cooperating.

Nice try at shilling though.

Through the looking glass here! (0)

Anonymous Coward | more than 2 years ago | (#37903300)

Of course he did, because what kind of frame-up would it be if he didn't confess, and reveal the body?

They clearly threatened him with something to get him to cooperate. If they're going to suborn the justice system, why stop there? Why not actually kill the woman, and then threaten the father if he doesn't confess to the crime?

This paranoid conspiracy has been brought to you by the letters U,F,O and the number 52.

Re:ReiserFS (0)

Anonymous Coward | more than 2 years ago | (#37903118)

*cough* MurderFS *cough*

Re:ReiserFS (0)

Anonymous Coward | more than 2 years ago | (#37903190)

That's awful, dude. :( Have some decency.

Obligatory: RAID is not a backup (5, Insightful)

Anthony Mouse (1927662) | more than 2 years ago | (#37902982)

Is the only reason you're looking at a clustered filesystem that you don't want to lose data? Because if it is, it's probably not what you want. The purpose of a clustered filesystem is to minimize downtime in the face of a hardware failure. You still need a backup in the case of a software failure or in case you fat finger something, because a mass deletion can replicate to all copies.

Re:Obligatory: RAID is not a backup (2, Informative)

chrb (1083577) | more than 2 years ago | (#37903326)

If you have more than one server then it's pretty easy to set up rsync with rolling backups (rsnapshot or rdiff-backup or whatever) which is more of a proper backup solution. It's also probably a bit easier to administrate than a clusterfs.

Having said that, Hadoop's HDFS [apache.org] looks quite good. AFAIK it is pretty robust, and it runs on top of an existing FS so you won't need to repartition, which is useful. FUSE file system driver, and Java, will be a bit slower than in-kernel, but probably not an issue for bulk data storage.

Oh, and another option is the Distributed Replicated Block Device [wikipedia.org] . Though this is basically network RAID and not replication on a per file basis.

Re:Obligatory: RAID is not a backup (1)

Anonymous Coward | more than 2 years ago | (#37903414)

HDFS is a really really /really/ bad suggestion considering the users reqs. There is zero reason to use HDFS if you're not using hadoop itself for processing. Period. The fuse plugin I'm sure has improved, but last I'd used it, it was several levels of hell dealing with; we dealt w/ it purely because it had some benefits as a side band way of injecting data into our hadoop processing.

Beyond that, HDFS still has a single point of failure- the metadata/name node (the rest are bricks in gluster fsterminology). Which is pretty contrary to what the dude was looking for considering his description above...

Re:Obligatory: RAID is not a backup (1)

Anonymous Coward | more than 2 years ago | (#37903592)

Having said that, Hadoop's HDFS [apache.org] looks quite good. AFAIK it is pretty robust, and it runs on top of an existing FS so you won't need to repartition, which is useful. FUSE file system driver, and Java, will be a bit slower than in-kernel, but probably not an issue for bulk data storage.

HDFS is not a solution. It doesn't provide POSIX capabilities such as random writes and altering existing files. Although FUSE lets you mount it and make it look like a regular FS, you need to make sure apps that use it only use the features that it supports otherwise, the apps will start getting errors when doing disk operations and potentially going down in flames when they try to save files or some such thing.

Re:Obligatory: RAID is not a backup (1)

SuperQ (431) | more than 2 years ago | (#37903450)

And of course what the post really wants is a DISTRIBUTED filesystem. Not a clustered filesystem.

Re:Obligatory: RAID is not a backup (2)

Enfixed (2423494) | more than 2 years ago | (#37903464)

Totally agree, the clustered approach doesn't seem to solve the problem posed. It's simple, buy a bunch of 2TB drives and set them up with ZFS. Configure a nightly snapshot job to another similar machine and call it a day. You can have a larger storage area with a fully redundant backup for less than 2K in parts.

Re:Obligatory: RAID is not a backup (1)

Demonantis (1340557) | more than 2 years ago | (#37903522)

He needs to get priorities in order. I would say raid is probably what he wants for the most part for like you said hardware failure. An online backup service for the stuff he truly needs to back up. I sincerely doubt a single person can amass 8 TB of data that would be critical to have. Having it all is nice, but definitely not realistic.

PronFS (2)

igny (716218) | more than 2 years ago | (#37902992)

Where is PronFS when we desperately need one?

Re:PronFS (0)

Anonymous Coward | more than 2 years ago | (#37903046)

You beat me to it. I knew somebody had to make a comment about 8TB of Pr0n.

Re:PronFS (2)

Jeremi (14640) | more than 2 years ago | (#37903446)

Where is PronFS when we desperately need one?

It's widely available... these days it goes by the name "the Internet".

Nagios Monitoring (1)

Anonymous Coward | more than 2 years ago | (#37903018)

Would recommend you look at nagios monitoring - you can monitor your raid with that. Has saved me a number of times (always nice to be notified when something fails).

I know this isn't what you asked but... (5, Interesting)

KendyForTheState (686496) | more than 2 years ago | (#37903026)

20 disks seems like overkill for your storage needs. Seems like the more disks you use the greater the risk of failure of one or more of them. Also, your electricity bill must be through the roof. I have 4 3TB drives with a 3Ware controller in RAID5 array which gives me the same storage capacity with 1/5th the drives Aren't you making this more complicated than it needs to be? ...Maybe that's the point?

Re:I know this isn't what you asked but... (0)

Anonymous Coward | more than 2 years ago | (#37903226)

I have 4 3TB drives with a 3Ware controller in RAID5 array which gives me the same storage capacity with 1/5th the drives

Yeah but RAID 5 is nowhere near as reliable as RAID 6 or RAID 1. And if speed is a concern, you're looking for RAID 10, and enough disks to provide the bandwidth you need. Allowing for spares, 20 disks doesn't sound like much for even 8TB of storage (say 16 disk RAID 10 plus 4 hot spares).

And this isn't even counting the disks needed for backups...

Re:I know this isn't what you asked but... (1)

Kagetsuki (1620613) | more than 2 years ago | (#37903766)

Please, if you're going to make such good posts don't do it as AC - you deserve the karma. It just so happens we're looking at a scheme almost exactly like what you outline, we've currently got a dual server "humming" configuration with on and off site backups but we need something more serious after getting an influx of customers.

ZFS (4, Informative)

Anonymous Coward | more than 2 years ago | (#37903034)

LVM, mdadm & Ext4 or ZFS seems like it would be more then adequate for this. A 2U server can hold 36TB of raw data with software raid and consumer disks. 2.5" would be preferable for home use considering power usage unless your a fellow Canadian; in which case servers make great space heaters.

Re:ZFS (0)

Anonymous Coward | more than 2 years ago | (#37903070)

+1 for ZFS

Rsync + VPN (1)

GeneralTurgidson (2464452) | more than 2 years ago | (#37903052)

Setup a mirrored server at a parents/relatives house that's preseeded and run rsync jobs to it. Add more storage and you can do generations too.

Re:Rsync + VPN (1)

imemyself (757318) | more than 2 years ago | (#37903242)

Yep...I do this with Unison so writes on both sides can be replicated. Granted - I'm not replicating significant amounts of data, I've heard Unison may have problems with large volumes of data. But I think the Internet connection would be more of an issue than that.

Re:Rsync + VPN (0)

Anonymous Coward | more than 2 years ago | (#37903280)

Yes, this. Or put the backup in AWS. I back up all my photos from the iMac to my Linux vm at Amazon with rsync. I'll bet my Amazon bill is less than what those disks and servers add to your electric bill.

Re:Rsync + VPN (0)

Anonymous Coward | more than 2 years ago | (#37903836)

Knowing that Jeff Bezos isn't jacking off over my pictures is worth the extra money.

You still need to make a decision (4, Insightful)

93 Escort Wagon (326346) | more than 2 years ago | (#37903068)

You ask about the technical specifications; but, when commenting regarding the three likely candidates you found, you've put philosophical objections first and foremost. I think you first need to figure out which factor is more important to you - specs, or philosophy. Otherwise you're probably going to waste a lot of time arguing in circles.

Production ready... (1)

Anonymous Coward | more than 2 years ago | (#37903072)

We've had a few problems with Gluster (nodes getting out of sync and corrupting data - despite following the docs to the letter). Very nice in theory, and will be great if the stability gets a bit of work, but until then I'm hesitant to recommend it. We've also found the performance a bit lacking.

Re:Production ready... (1)

Marillion (33728) | more than 2 years ago | (#37903688)

I'm working on a multi-institution team doing biomedical research and one of the team members is using Gluster. It's 200TB of high resolution microscopy spread across six brick (aka: nodes) systems. I don't know if the vendor misconfigured it, but it is a complete pig of a system. It's slow. Painfully slow. We ended up copying active data to a small 12TB consumer NAS for analysis and leave the Gluster as the permanent archive.

No ZFS? (4, Interesting)

theskipper (461997) | more than 2 years ago | (#37903074)

How about ZFS with your RAID controllers in single drive mode (or worst case JBOD)? Let ZFS handle the vdevs as mirrors or raidz1/2 as you wish. ZFSforLinux is rapidly maturing and definitely stable enough for a home nas. Or go the OpenIndiana route if that's what you're comfortable with.

My 4TB setup has actually been a joy to maintain since committing to ZFS, with BTRFS waiting in the wings. The only downside is biting the bullet and using modern CPUs and 4-8GB memory. Recommissioning old hardware isn't the ideal way to go, ymmv.

Just a thought.

Re:No ZFS? (1)

JoeMerchant (803320) | more than 2 years ago | (#37903384)

ZFSforLinux is rapidly maturing and definitely stable enough for a home nas.

ZFS has been maturing rapidly for the last 6 years... Didn't it almost make its way into OS-X at one point? I'm not sure I'd put all of my eggs in that particular basket (or any single system, really).

If it's backup you want, I'd look into a system that copies off of one type of file-system into another. Ever since my QNAP TS-109 took a dump, and with it my data because of their proprietary "Linux" partition formatting, I've stuck to nice simple low performance solutions like 2TB USB drives straight out of the box. They are readable on Win/Lin/OSX as well as being plug-and-play on anything that has called itself a personal computer in the last 10 years. If you need something esoteric like a single 20GB volume, then this isn't the way, but I think the wise course is to find a way to not need something esoteric.

Re:No ZFS? (3, Insightful)

hjf (703092) | more than 2 years ago | (#37903460)

ZFS isn't free anymore. It's all commercial and proprietary and no bugfixes or anything get released outside a big bad support contract with Oracle.

If you want the free version you can still use v28 on FreeBSD and Solaris Express (no upgrades in over 1 year). Works great, the only thing you don't get is ZFS crypto (transparent encryption).

Re:No ZFS? (0)

Anonymous Coward | more than 2 years ago | (#37903544)

Wrong. Nexenta Core (free, Hardy-based distribution) or the NexentaStor Community Edition (NAS software, free, up to 18TB) are both available, and they have the latest and constantly improving versions of ZFS based on Ilumos. Lots to love there. (www.nexenta.org or www.nexenta.com)

Re:No ZFS? (1)

bill_mcgonigle (4333) | more than 2 years ago | (#37903546)

If you want the free version you can still use v28 on FreeBSD and Solaris Express (no upgrades in over 1 year).

or linux [zfsonlinux.org] .

Re:No ZFS? (-1)

Anonymous Coward | more than 2 years ago | (#37903736)

Nice try! Unfortunately your unreasoning hatred of Oracle has resulted in your blindness.Solaris-11 express was released in source form 8 months ago. If you don't want Solaris-11 you can use OpenIndiana or Illumos or SmartOS. You may have reason to dislike Oracle's business practices, but don't lie about Solaris source availability

Thoughts on OCFS (3, Interesting)

trawg (308495) | more than 2 years ago | (#37903076)

We have been using OCFS [oracle.com] (Oracle Cluster File System) for some time in production between a few different servers.

Now, I am not a sysadmin so can't comment on that aspect. I'm like a product manager type, so I only really see two sides of it: 1) when it is working normally and everything is fine 2) when it stops working and everything is broken.

Overall from my perspective, I would rate it as "satisfactory". The "working normally" aspect is most of the time; everything is relatively seamless - we add new content to our servers using a variety of techniques (HTTP uploads, FTP uploads, etc) and they are all magically distributed to the nodes.

Unfortunately we have had several problems where something happens to the node and it seems to lose contact with the filesystem or something. At that point the node pretty much becomes worthless and needs to be rebooted, which seems to fix the problem (there might be other less drastic measures but this seems to be all we have at the moment).

So far this has JUST been not annoying enough for us to look at alternatives. Downtime hasn't been too bad overall; now we know what to look for we have alarming and stuff set up so we can catch failures a little bit sooner before things spiral out of control.

I have very briefly looked at the alternatives listed in the OP and look forward to reading what other reader's experiences are like with them.

Re:Thoughts on OCFS (0)

Anonymous Coward | more than 2 years ago | (#37903554)

We replaced the F in OCFS with something else after the same type of issues...oh...and it's fencing method is pretty ugly.

Re:Thoughts on OCFS (0)

Dishwasha (125561) | more than 2 years ago | (#37903670)

Thank you, this is one of the few valid answers to my primary question which is of actual experience with clustered file systems. I don't think most of the responders got the clue that I'm looking for a solution that will hopefully scale over a decade's worth of time. Investing in separate backup hardware doesn't work because it won't scale seamlessly either. I also don't think many of these people realize that snapshots are effective backups and with distributed and clustered filesystems you can ensure your snapshots are stored redundantly without overwrite corruption. Although RAID is not backup, backup hardware is really old hat. And when my paranoia really sinks in, I can transport those snapshots to alternative media to balance the MTFR of platter-based technologies when the hardware matches my budget. And lastly I think they missed the mentioning of virtualization so they're probably not thinking much about HA and qcow2 base imaging. I'll likely be upgrading to a Super Micro 2U Twin with QDR Infiniband which none of the mentioned solutions have support for in comparison to Lustre and GlusterFS.

Re:Thoughts on OCFS (3, Insightful)

afabbro (33948) | more than 2 years ago | (#37903862)

Thank you, this is one of the few valid answers to my primary question which is of actual experience with clustered file systems. I don't think most of the responders got the clue that I'm looking for a solution that will hopefully scale over a decade's worth of time.

There is a question of missing clues, but I don't think it's in the responders. You either asked your question poorly or you don't understand your problem. Your question centers of being "paranoid about data loss" and yet you're discussing technologies designed to manage concurrent access to a filesystem. Do you put in gigabit ethernet when you want faster USB performance?

I'll likely be upgrading to a Super Micro 2U Twin with QDR Infiniband

Give me a break...

AWS EBS (1)

curmudgeon99 (1040054) | more than 2 years ago | (#37903090)

Why are you spending your money like that? Sneaker-net your drives to AWS EBS. It's a no-brainer.

Re:AWS EBS (2)

Enfixed (2423494) | more than 2 years ago | (#37903306)

Why are you spending your money like that? Sneaker-net your drives to AWS EBS. It's a no-brainer.

AWS EBS = $0.10 per allocated GB per month or $102.40 per TB..... I doubt power and hardware is costing him > $819.20 a month.

Re:AWS EBS (0)

Anonymous Coward | more than 2 years ago | (#37903424)

I think your quote is a bit high. And his current solution cost him half his data. What was that worth?

Re:AWS EBS (0)

Anonymous Coward | more than 2 years ago | (#37903494)

You could just Google "AWS EBS cost" and confirm for yourself... but here.. I'll give you the direct link. http://aws.amazon.com/ebs/ [amazon.com] (Hint: Its at the bottom under projecting costs.)

I was going to say Lustre, but... (3, Insightful)

Anonymous Coward | more than 2 years ago | (#37903096)

I was going to say Lustre, but then I saw that you only have 16TB. 15 years ago that would have been impressive, but these days, those supercomputers you mention probably have that much in DRAM, and their file storage is in the multi-petabyte range. Lustre is optimized for large scale clusters, in which you have entire nodes (a node is a computer, here) dedicated to I/O - bringing external data into the in-cluster network fabric, while other nodes are compute nodes - they don't talk to the outside world, except by getting data via the I/O nodes.

That's why you'll see all this talk of OSSs and OSTs, as though they'd be distinct systems - on a large scale cluster they are.

For only 16TB, what you want is a SAN, or maybe even a NAS.

If you want open source, then go with openfiler. It supports pretty much everything. I haven't stress tested it, but it seems to work well for that order of magnitude of data.

Tahoe-LAFS (1)

the_brobdingnagian (917699) | more than 2 years ago | (#37903148)

Try Tahoe-LAFS [tahoe-lafs.org] .

Re:Tahoe-LAFS (1)

Dishwasha (125561) | more than 2 years ago | (#37903680)

Not a bad suggestion and more helpful than most. Thanks for the input!

rm (1)

kbrint (924971) | more than 2 years ago | (#37903188)

/bin/rm

huh (1)

madcat2c (1292296) | more than 2 years ago | (#37903196)

I wonder how long it would take to backup 8TB to carbonite dot com?

LTO4 (1)

hawguy (1600213) | more than 2 years ago | (#37903206)

I think the best disk-hardware agnostic solution for preventing filesystem dataloss is an LTO-4 autoloader and regular tape backups (hopefully taken off site regularly). They are pretty cheap, a superlader3 with an 8 tape (6TB/12TB) capacity is less than $3000. Or buy a refurb LTO3 autoloader for a third the price and half the capacity.

Bad Dog. Wrong Tree! (3, Insightful)

SmurfButcher Bob (313810) | more than 2 years ago | (#37903228)

You will spend all this effort to build this solution... and then your house will catch fire.

On the good side, the fire department WILL manage to save the basement by filling it with 80,000 gallons of water at 2,000GPM per fire engine.

Or, you'll be wiped out by a flood. Or a drunk will drive through the side of your house. Or you'll have a gas leak and the house will detonate. Or carpenter ants will eat away the floor joists.

Raid is not a backup solution. Neither is replication... if you whack the data, it'll likely be replicated. If you get a compromised machine somewhere, files they touch will likely be replicated. They only thing you're creating is an overly complex hardware mitigation. If THAT is how you define "data preservation"... you're doing it wrong.

Look more for a solution to move stuff offsite - a cheap pair of N routers running Tomato or OpenWRT, to a neighbor's house, and you reciprocate with each other. Bonus points if you use versions, transaction logs, journals, etc.

Re:Bad Dog. Wrong Tree! (0)

Anonymous Coward | more than 2 years ago | (#37903530)

I wish Bob were my neighbor.

Obilg. (1)

jampola (1994582) | more than 2 years ago | (#37903232)

"For over a decade I have had arrays of 10-20 disks providing larger than normal storage at home"

At home?? I've met some people pretty fanatical about their porn collections but this hits some new highs! Kudos to you, Sir!

Paranoid about your data? Do off site backups! (1)

Anonymous Coward | more than 2 years ago | (#37903254)

Take it from someone who has been there no better resource than IRON MOUNTAIN to store a backup copy of your data.

Offsite every day for full or once a week depending on how important your data is..do Fulls and delta's

Hadoop HDFS (1)

mrcheesyfart (923434) | more than 2 years ago | (#37903266)

You can use Apache Hadoop's HDFS. http://hadoop.apache.org/hdfs/ [apache.org] It is fairly simple to set up, very scalable, and it is very easy to set up a replication factor so that all your data is replicated 2, 3 or even more number of times across your cluster. It is used at many places for distributed computing, but I see no reason that it couldn't serve you well as a large personal file service.

Re:Hadoop HDFS (1)

diekhans (979162) | more than 2 years ago | (#37903380)

it's not a POSIX file system

unRAID (1)

aarongadberry (1232868) | more than 2 years ago | (#37903294)

Unraid works well for a home solution. I had 2x2 TB drives fail within a one month period of time and lost no data.

Drobo? (1, Informative)

varmittang (849469) | more than 2 years ago | (#37903332)

Drobo pro with 3 TB drives setup with dual redundancy will get you 18 Gigs of drive space. In the future, just swap out drives as drive sizes get larger and you can continue to expand. www.drobo.com

Re:Drobo? (2)

speedingant (1121329) | more than 2 years ago | (#37903558)

Slow as molasses though. Way slower than any other solution out there..

Re:Drobo? (0)

Anonymous Coward | more than 2 years ago | (#37903690)

Drobo pro with 3 TB drives setup with dual redundancy will get you 18 Gigs of drive space. In the future, just swap out drives as drive sizes get larger and you can continue to expand. www.drobo.com

Performance is poor with a Drobo. Price isn't so great either. If you want a decent setup you can tinker with, but is reliable, go with unRAID.

Stop with experimental shit (1, Insightful)

ArchieBunker (132337) | more than 2 years ago | (#37903338)

Seriously stop with the experimental and filesystem projects still in beta. You need one that is matured and time tested. Do a bit of research. I don't even run RAID and have yet to permanently lose anything in probably 20 years.

Moncler Frauen (-1)

Anonymous Coward | more than 2 years ago | (#37903346)

Moncler Moncler Schal & Mütze [monclerpullover.com] Clothing can be the name Moncler Frauen Stiefel [monclerpullover.com] to get used to when you Moncler Herren Schuhe [monclerpullover.com] need to own one of the most Moncler Pullover Frauen [monclerpullover.com] effective except to have the Moncler Handtaschen [monclerpullover.com] ability to raise your visual Moncler Kids [monclerpullover.com] element and personality.

Oracle Cluster File System or Global File System (1)

xose (219487) | more than 2 years ago | (#37903398)

http://en.wikipedia.org/wiki/Global_File_System
http://en.wikipedia.org/wiki/OCFS

Lustre (3, Informative)

JerkBoB (7130) | more than 2 years ago | (#37903468)

Lustre is pretty cool, but it's not magic pixie dust. It won't break the laws of physics and somehow make a single node faster than it would be as a NFS server. It's for situations when a single file server doesn't have the bandwidth to handle lots of simultaneous readers and writers. A "small" Lustre filesystem these days usually has 8-16 object storage servers serving mid-high tens of TB. The high end filesystems have literally hundreds of OSSes and multiple PB served. The largest I know of right now is the 5PB Spider [nccs.gov] filesystem at Oak Ridge National Labs.

One nice thing about Lustre on the low end is that you can grow it... Start out small and add new OSSes and OSTs as you need them. This often makes sense in Life Sciences and digital animation scenarios where the initial fast storage needs are unknown or the initial budget is limited (but expected to grow). But if you're never planning to get beyond the capacity of a single node or two, Lustre is just going to be overhead. I don't know much about the other clustered filesystem options.

Re:Lustre (1)

Dishwasha (125561) | more than 2 years ago | (#37903706)

Yeah, the complication is why I'm leaning more towards GlusterFS, yet so far Lustre is more proven. Unless I get some useful anecdotal experience here I'll probably model out all three solutions with VMs and do my own comparisons and performance analysis. Maybe I'll even post my experiences and results here afterwards.

Fix the machines first... (2)

k9mach3 (2497704) | more than 2 years ago | (#37903474)

Lustre - no replication (it's on the roadmap for sometime in the next few years), and it relies on access to shared storage (read: FC/iSCSI disk array, and if that fails you loose your data.). OCFS - no replication, designed for multiple servers accessing one array. Ceph - has replication, but still in active development, and somewhat complex. Good if you don't mind loosing your data (it's in alpha... if it breaks, you get to keep both pieces...) GlusterFS - I have no experience with it, but it seems to be pretty stable at this point. And has some degree of replication with is a plus. If all you're going for is replicated storage across two systems I'd recommend just setting them up separately and rsync'ing from one to the other. Otherwise, one filesystem crash will take out all your data - parallel filesystems can buy you some reliability, but still can't be considered "backup" strategies. And you still need to pay attention to things like RAID (at least RAID6! RAID5 is likely to fall apart after one disk failure with >2 TB disks),

Re:Fix the machines first... (1)

Dishwasha (125561) | more than 2 years ago | (#37903750)

http://wiki.lustre.org/index.php/Lustre_2.0_Features [lustre.org] lists filesystem replication as a benefit of Luster 2.0 back in November 2009. I won't be running any RAID since my requirement isn't really to reduce number of disks used by relying on parity. One or more replication partners/mirrors will handle that function. Rsync won't work for the aforementioned clustered virtualization needs.

eBay. look for an old NetApp (0)

Anonymous Coward | more than 2 years ago | (#37903480)

seriously. NetApp

Performance (3, Informative)

speedingant (1121329) | more than 2 years ago | (#37903508)

What kind of performance are you after? If you're not after anything over 40MB/S, I'd go for unRAID. I use this at home and it's brilliant. I've replaced many drives over the years, and I've had two hard drives fail with no massive consequences (data isn't striped). Plus, many many plugins are now available. SimpleFeatures (replacement gui), Plex Media Server, SQL, Email notifications with APCUPSD support etc etc.

Check out MFS (0)

Anonymous Coward | more than 2 years ago | (#37903618)

MFS seems to work quite well. it's designed for large files, not lots of small ones. it will allow you to set number of replicas required at the file or directory level.

http://www.moosefs.org/

Two servers using ZFS (1)

drsmithy (35869) | more than 2 years ago | (#37903634)

One as the primary, sharing space via NFS for your VMs and whatever else. Throw a couple of SSDs in there for caching.

The second replicating from the first (via ZFS send/receive, or just simple rsync) with snapshotting for backups and regular syncs to some off-site data store for truly irreplaceable data.

This is the setup I use at home, and it sits behind a 3-node VMware cluster, several desktop PCs (one of which boots from the main server over iSCSI), and couple of media PCs.

Other than that, your requirements seem a bit confused. "Cluster filesystem" looks to be a buzzword being thrown out there without any actual need for same. "the cluster filesystem should work well with hosting virtual machines in a high-available fashion thereby supporting guest migrations" is a non-sequitur as neither a cluster filesystem, nor high-availability are a necessity for "guest migration".

What are your key requirements here ? Data reliability is a lot easier (=cheaper) to achieve than high availability, and it's a struggle to see how real high availability could be any sort of requirement in a home server scenario.

Re:Two servers using ZFS (1)

Dishwasha (125561) | more than 2 years ago | (#37903834)

Hmmm.....I'll have to look more deeply in to ZFS as I keep hearing it thrown out there. I should have probably qualified my statement as "thereby supporting live guest migrations". The non-sequitur was basically a hint thrown in to suggest what I meant by high-availability for those less likely to catch or understand the subtle distinction of what high-availability typically means. Just like most other things in life, key requirements may be more basic than what I've described, but if the car salesman throws in the air freshener in free with the car, I'll take it as long as it doesn't stink.

cheap NAS (1)

borgasm (547139) | more than 2 years ago | (#37903662)

go buy 2 or 3 cheap 8-10TB NAS devices

cycle one of them through every few months for a backup, and then store it at another physical location

that will run you less than $3000 total and a lot fewer headaches

Serverfault would be a much better choice (0)

Anonymous Coward | more than 2 years ago | (#37903666)

http://serverfault.com

Hardware RAID is a no no. (0)

Anonymous Coward | more than 2 years ago | (#37903702)

This is what I understood the common wisdom to be for years now.

Wow... (1)

RecoveredMarketroid (569802) | more than 2 years ago | (#37903720)

You're serious about protecting your porn...

MooseFS is Solid and mature (1)

Anonymous Coward | more than 2 years ago | (#37903724)

I have been using MooseFS for over a year now, it has proven to be amazingly solid, and very easy to set up and manage. I am running a 600 TB install that is maintaining over 40 million files for a large music service.
Check out the MooseFS website:
http://moosefs.org

Moose can also run on any Unix like system, so you are not restricted to Linux, I have connected Linux, FreeBSD and Mac OSX systems to it, it also scales very cleanly and was much faster in our initial tests than GlusterFS. I highly recommend it!

what are you trying to fix? (1)

dameon (72340) | more than 2 years ago | (#37903726)

Is online redundancy (IE availability) your concern? Or is it recover-ability?

If your concern is the ability to recover in the event of hardware failure, you are over complicating the situation. I have about 1.5 TB of "data" between pictures of the family, movies, music, games, configs, documentation, and the list goes on. So, my primary storage server at home has 2x 2TB Western Digital Green drives that are just in a simple Linux software mirror. I also have two more disks that alternate between my house and a safe deposit box at the bank. About once a month (or more frequently if I add files to my server), I rsync my data to the disk at home, and take it to the bank.

The script that syncs does a simple rsync --delete -avx /blah/ /backup/ I also mount /blah (the source) as read only while I do the rsync to prevent something stupid from happening.

Now, you mentioned you had a large array, and that's fine. I'd buy a few 3TB drives and create a volume group with them, create your /backup on that volume group, and do the same thing. These are backup disks, they don't need to be fast.

I don't trust hardware raid (specialized controller raid), and while I am a unix admin, and manage large GPFS, Ibrix, and GFS clusters at work, I think that simplicity is always better.

The safe deposit box costs me about $25 / year, and keeps me safe in the event of a fire, theft, meteor, zombie invasion, etc.

A friend suggested that I just put a few drives in one of his servers, and rsync via ssh to his box. I don't want to do this for two reasons.
1) I don't have a lot to hide, but I don't really want everyone poking through all my pictures and whatnot
2) I'm lazy, so I'd probably script it up and I wouldn't think about it until I needed it. So, it wouldn't prevent me accidentally blowing data away on the replica before I noticed I blew something up.

Redundant RAID arrays (0)

Anonymous Coward | more than 2 years ago | (#37903762)

I worked for a 911 call center. We had redundant raid arrays for the oracle database. One array was RAID 0+1 (striped mirrors), and the other was 6 (block-level striping with double distributed parity) . The idea was that you have redundancy so the odds of both arrays crapping out is unlikely (wear patterns on the drives will not be identical, whichever one dies first, gets fixed first, then mirrored, and if the other dies the next day, you don't lose any day data and there is no downtime). Its not a bad idea to archive data too. That was done on a regular basis. I have my own databases for a website. Not huge, not oracle, but I run weekly archives (I dump the database, and archive all of the scripts and website directories and compress it all). My database only currently sucks up about 600MB, and the website directories (and all the config files) maybe another 150MB. My directory archives are incremental after the first to save space. Either you archive, make redundant archives, or risk data loss. I've worked for places that move moved tapes offsite. Thats one option, another is a wifi connected NAS in the garage. If there is a fire, your data is safe. Its another option you could look at. Just sayin'.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>