Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Distributed Storage Systems for Linux?

Cliff posted more than 9 years ago | from the spread-out-all-ova-tha-place dept.

Data Storage 52

elambrecht asks: "We've got a _lot_ of data we'd like to archive and make sure it is accessible via the web 24/7. We've been using a NetApp for this, but that solution is just waaaay to expensive to scale. We want to move to using a cluster of Linux boxes that redundantly store and serve up the data. What are the best packages out there for this? GFS? MogileFS?"

cancel ×

52 comments

Sorry! There are no comments related to the filter you selected.

Lustre (5, Informative)

Yerase (636636) | more than 9 years ago | (#12442208)

Check out Lustre at http://www.lustre.org/ [lustre.org] It's being developed/used by the DOE on alot of Supercomputer Cluster systems, for multi-terabyte storage stuff.

Lustre for ClusterFS (1)

SanityInAnarchy (655584) | more than 9 years ago | (#12448269)

It looks great, except for one thing:

2.4 only.

The system itself is developed in a VERY cathedral-like style by a company called ClusterFS, who is selling the 2.6 version. My guess is they'll release it for free when Linux 2.8 or 3.0 is released, so that they always give away the obsolete version and sell the new one.

Re:Lustre for ClusterFS (1)

aminorex (141494) | more than 9 years ago | (#12454549)

What, pray tell, does 2.6 offer that makes it needful for the questioner's application?

needful? (1)

SanityInAnarchy (655584) | more than 9 years ago | (#12460058)

It's just a lot nicer in a lot of ways. What, pray tell, does Lustre offer that makes it "needful" in that situation? He could just get some sort of a SAN device. But Lustre would probably be nicer. In the same way, 2.6 is nicer.

I've seen systems still running 2.2, but it's not pretty. 2.6 is the way to go these days. Lustre is about the ONLY reason anyone should be using anything older.

How big? (1)

b00stA (839177) | more than 9 years ago | (#12442286)

It would be useful to know about how much data we're talking.
I suppose there's a difference between serving just 500GB or a few terabytes.

What else... (1)

Albigg (658831) | more than 9 years ago | (#12442412)

Also, what else do you need to do with the data? Back it up? Mirror it? Data mine it? Any type of performance requirements? Interoperability requirements? The list goes on. We need more information.

Re:How big? (1)

thsths (31372) | more than 9 years ago | (#12444215)

> I suppose there's a difference between serving just 500GB or a few terabytes.

If it is only that, you could just stick 8 disks in a nice server system, that gets you a few terabytes. Use your filesystem of least mistrust, such as reiserfs, XFS or JFS.

RAID 1+0 or 10 is very nice, but you still have a single point of failure. NetRAID and some kind of fail-over redundancy with another machine way be the way. You will not get a full SAN/NAS this way, but you also avoid a lot of complexity.

Panasas -- check it out (3, Informative)

middlemen (765373) | more than 9 years ago | (#12442472)

Panasas http://www.panasas.com/products_overview.html [panasas.com] has some products which probably fit your requirement of high speed distributed storage.

Re:Panasas -- check it out (1)

DA-MAN (17442) | more than 9 years ago | (#12445578)

Panasas http://www.panasas.com/products_overview.html [panasas.com] has some products which probably fit your requirement of high speed distributed storage.

Panasas is a GPL Violator, as per conversations with them during a demo. They have a proprietary module that is built against only RHEL kernels. They confirmed that this module uses kernel headers and other parts of the kernel, but they refuse to release the source to any customer.

If you are ever in a situation where you need to upgrade kernels in a hurry, switch branch (2.4 to 2.6) or the company goes outta business then you will have just a bunch of nfs file servers with locally attached disks. Makes debugging kernels a pain.

Re:Panasas -- check it out (0)

Anonymous Coward | more than 9 years ago | (#12446717)

A proprietary module is not a GPL violation.

Re:Panasas -- check it out (1)

DA-MAN (17442) | more than 9 years ago | (#12447205)

A proprietary module is not a GPL violation.

A self contained module is not a GPL violation. Having a GPL'd wrapper that can load your proprietary module is not a GPL violation. Having a driver that has headers and kernel source in the compiled version would make it a GPL violation since it is a "derivative".

Our crystal ball is fuzzy! (4, Insightful)

afabbro (33948) | more than 9 years ago | (#12442500)

What kind of idiotic Ask Slashdot is this? All of the important data is missing:
  • What's "a lot"? 1MB is a lot of data if you think about it. When people start talking about "a lot" of data these days, I assume they're meaning hundreds of terabytes. Is that what you mean?
  • What's the budget? What performance do you need? Do you need to back it up? Do you need to replicate it? Your post is sort of like "hi, I have a problem. What is the answer? Thanks!"
Also, it's "too expensive to scale," my friend. You'd think an "Editor" like Cliffy would fix posts, but he's too lazy.

If you can afford NetApp, why not keep with NetApp? A bunch of Linux boxes is not a storage solution. Indeed, what does Linux have to do with anything? We're talking storage here. What are you planning to do - put in 200 of them with internal SATA drives? Yeah, that'll be a lot cheaper to maintain...

I'm not shilling for NetApp, but if you really have "a lot" of data to put "on the web" "24/7" then you need some kind of real storage solution like a NetApp or one of their competitors.

Now go away and please take Cliff with you.

look at the rest NetApp is 4th place (2, Informative)

johnjones (14274) | more than 9 years ago | (#12442868)

NetApp is number four in storage revenue terms, after EMC, HP and IBM

so go ask them about what you want

really you can admin your white box's (that become a NAS ) or you can get a NAS

are you thinking SAN ?

also talk to Apple they do some nice product as well as SUN

whats this for large data ?
video data go talk to SGI and their XFS products

really it depends on what your doing NetApp is great for company File system of documents but Bad if you want to get the most out of your storeage and you do mostly video/music dont care about snapshots etc....

regards

John Jones

Re:look at the rest NetApp is 4th place (1)

superpulpsicle (533373) | more than 9 years ago | (#12446091)

SAN is the ultimate storage solution. IBM, HDS, EMC, HP, SGI, Engenio go with any of them. Go with the cheap lineups if you have to.

NAS/NetApp is so overrated, more like 7th place. Only reason why it has made a headway is because it is so close to NFS. And everyone can do NFS commands.

Re:Our crystal ball is fuzzy! (2, Insightful)

Punboy (737239) | more than 9 years ago | (#12446462)

A bunch of Linux boxes is not a storage solution.

Hey man, don't tell that to Google.

Re:Our crystal ball is fuzzy! (1)

prairiedock (626753) | more than 9 years ago | (#12447178)

Also, it's "toO expensive to scale," my friend. You'd think an "Editor" like Cliffy would fix posts, but he's too lazy.

I don't think the Slashdot editors can spell either. An "editor like Cliffy" recently allowed the word "persue" to appear in a headline.

forced to get NAS windows 2000 embedded (1)

mrbass (742021) | more than 9 years ago | (#12442562)

Back about 4 years ago we were forced to get a maxtor netattach (can't remember the name) because at the time journaling file systems were virtually non-existent. Then that lasted for a year before we outgrew it and then we went with a Dell NAS server 600GB also windows 2000 embedded. It has scsi connection to connect SuperDLT tape backup drive and the windows 2000 backup program works for our needs. Simple for any average joe to restore files.

I did look into getting a linux NAS but the solutions out there didn't support external tape drives all that well like SuperDLT. Backup software was crazy on Linux. I tried over 15 IIRC different Linux backup solutions, everything from ARCserve for Linux to free backup scripts. I just use tar to backup our mail server and to restore it's not totally intuitive but I manage.

So please tell me now there has got to be a decent Linux NAS solution out there that has web based interface to manage (add users, groups,etc.), has scsi connection for LTO, DLT tape drives, and comes with decent backup and restore program/interface.

Re:forced to get NAS windows 2000 embedded (1)

CharlieHedlin (102121) | more than 9 years ago | (#12444915)

Snap Appliances would seem to meet your requirements.

They are linux based, web management, have the scsi port, and they sell backup and restore software.

I use one, but I back it up as a linux server with Veritas Netbackup to an existing tape robot on another linux server.

AFS ?? (3, Interesting)

forsetti (158019) | more than 9 years ago | (#12442660)

How about OpenAFS [openafs.org] ? It is sort of like NFS on steroids, with redundancy, scaling, cacheing, Kerberos-based security ... I've just started looking at it myself, but it seems pretty slick.

Re:AFS ?? (1)

runderwo (609077) | more than 9 years ago | (#12444102)

AFS is a distributed filesystem with highly integrated clustering. What catches many people off guard about AFS is the way it does replication. Volume replicas are read-only, and in order to update the replicas, a command must be issued (vos release) - they are not automatically kept up-to-date. As long as your replication needs are read-only and you don't need coherence automatically maintained for you, AFS might be the way to go.

Re:AFS ?? (1)

Benley (102665) | more than 9 years ago | (#12462748)

Lacking read/write replica support can be annoying at times, but the manual release of the read/write copy to the replicated ro clones can be damn handy. Think about it - you get a free builtin staging area for whatever data you are pushing out! I use it that way, at least, for staging new versions of websites and applications before "releasing" them to the replicated read-only sites that are visible to the world.

Centera (4, Informative)

egarland (120202) | more than 9 years ago | (#12442927)

Get A Centera.

I'm biased but this is a high level Linux based storage system done right. It's not easy to create a coherent storage system out of lots of separate machines, the software that runs on this cluster does a lot of work. This thing fully redundant with no single point of failure, dynamically expandable without even taking it offline, it scales to 100's of terabytes and manages all that content continuously (scanning for corruption and fixing it, garbage collecting, etc..). The cluster has redundant backend networks and parallel paths everywhere, it even uses reiserfs to store the data. There's a lot of good engineering in this unit and they sell it at a decent price compared to NAS boxes.

Check it out:
http://www.emc.com/products/systems/centera.jsp [emc.com]
I do work for EMC (like I said.. I'm biased) but I don't speak for them, my opinions are my own.

Storage clustering is simply hard to do while still presenting a low level filesystem interface. Tossing that out and creating file storage as a high level service with a richer interface seems like the right approach to me. Show me a storage clustering solution that doesn't do that and I'll show you something full of bugs, expandability issues, limitations, and pain points.

Re:Centera (1)

7213 (122294) | more than 9 years ago | (#12446380)

Not to knock your employer (ok, so maybe to knock your employer). But the Celerra has soured my tast for EMC NAS (or NAS type) solutions so much that even without ever working with NetApp, I would recomend them.

Of course, the centera always did sound interesting. But last I heard you still needed to write to the centera API, no block access for you (or real NAS type either). But I did hear murmers of this changing.

Anything to avoid a Celerra.

Re:Centera (1)

egarland (120202) | more than 9 years ago | (#12447833)

But last I heard you still needed to write to the Centera API, no block access for you (or real NAS type either). But I did hear murmers of this changing.

There are at least two products you can put in front of Centera to make it look like a standard filesystem: CUA (a Centera specific EMC product) and Legato Disk Extender. The tradeoff is that by interfacing with it like a filesystem you re-introduce the limitations of filesystems and lose all the automatic functionality the API gives you. If you look at a program that writes files vs one that writes to a Centera, the Centera one is usually simpler and works better. The API has limited language support (C and Java) but since it talks C its usually easy to build bridges from other languages.

This type of storage just isn't block or file level. It's higher level than that. Because of that, just like programming in java vs C, you loose a little control over how things are done but in the end, if they are done right.. who cares?

I think once standard interfaces to this type of storage are built, using it will be a no brainier.

Again.. I work for EMC them but I don't speak for them, my thoughts are my own.

Re:Centera (1)

Crypt0pimP (172050) | more than 9 years ago | (#12447109)

As long as we're being honest...

I was just laid off from Xiotech,(with about 100 others) one of the smaller SAN vendors..

But we call the Centerra a "data jail". It's like the roach motel..

Data checks in, but it don't check out. It can't scale beyond a 42U rack enclosure. It's a bunch of little servers striped together to form a big NAS with a metedata controller in the middle.

Just bashing the 800lb gorilla...
OTOH, If you're hiring, I'm willing to tell you your products could rule the world!

Regards.
Patrick

Re:Centera (2, Insightful)

egarland (120202) | more than 9 years ago | (#12448513)

But we call the Centerra a "data jail". It's like the roach motel..

Ug. It's just not true. Most applications that are built to work with Centera include functionality to migrate in/out of the system just like most applications that are built to work with tape can both put data on and get it back. The difference is tape sucks, Centera doesn't.

It can't scale beyond a 42U rack enclosure.

Also not true. I have worked extensively with a 3 rack install with about 50tb of data on it. I believe all versions of Centera since the very first are capable of scaling to 4 racks and some are capable of going to 8 racks. Lots of customers have 2 rack installs. Raw storage on the currently shipping nodes is over 1 tb per node and you can put 32 nodes in a rack. Do the math, a 4 rack Centera is quite big even after taking mirroring or CPP into account.

It's a bunch of little servers striped together to form a big NAS with a metedata controller in the middle.

No. No No.

It IS a bunch of little servers but no they are not "striped together", and no they don't form a NAS. There is no "metadata controller" and there certainly isn't one in the middle. It is a storage cluster that has features specifically designed to store fixed content. Centera is not a simple Linux hack to make a bunch of boxes look like a storage cluster. It's a robust, flexible, well thought out piece of clustering software that is built on top of a Linux base.

Centera hardware is good stuff too. It has redundant externally facing servers (access nodes) so that if one fails, applications can keep working. Both back end switches are linked to every node so everything has redundant data paths. Data is stored in such a way that no data is unavailable if any single node fails or goes offline for any reason.

It's easy to dismiss Centera because it's so different from the standard storage systems who's basic interfaces really haven't changed in 3+ decades. It's not a block device. It's not a filesystem. It's not a mountable share. It's a storage cluster with functionality specifically designed to manage fixed content. It is accessed only through a client side API that talks to the cluster over IP. It isn't easy to wrap your head around.

Re:Centera (1)

egarland (120202) | more than 9 years ago | (#12448533)

Oh yea.. I forgot the obligatory..

I work for EMC but I don't speak for them, my thoughts are my own even if I sound like an EMC cheerleader/sock puppet.

ACE EMC SAN (1)

moishevd (613042) | more than 9 years ago | (#12443013)

SAN Solutions via EMC/Dell package http://www1.us.dell.com/content/products/compare.a spx/fibre?c=us&cs=555&l=en&s=biz [dell.com] In my opinion is best for the buck ROI CX300, 15 146G drives, with Snapview, Powerpath, Navisphere, with Gold Support on all software and hardware components - $55,000. Extremely scalable. Database cluster performance improvement by far over 50% as compared to NAS or DAS. What ever package on top starting from middleware hardware for embarrassingly parallel cluster farms to MogileFS to MSCS will be pretty happy, assuming your database/app is optimized etc... And than you should run VMware ACE.

Ask Google (2, Interesting)

VernonNemitz (581327) | more than 9 years ago | (#12443088)

I'm sure they'd be happy to sell you something along the line of serving data....

Even easier (1)

WindBourne (631190) | more than 9 years ago | (#12454531)

create a number of accounts at Gmail. Then simply e-mail your .tgz data.

Clustering filesystems- an overview (4, Informative)

houdini_cs (876959) | more than 9 years ago | (#12443170)

I did some reasearch on clustering filesystems for work a while ago. Here's the Cliffs-notes version:

GFS
High-end, a pain in the ass to set up and run. Wants a RHEL server or two to run.
OpenGFS
Started as a fork of GFS when the GFS license changed, it has followed a bit of a different path. Not nearly as stable or fast as GFS, but might be there some day.
Lustre
Lustre should be really nice, but is horrendous to run (at least, that's the word from my friends at Sandia, who know a thing or two about it). General consensus is that you need a full-time staff member just to make it work. If you can afford that, it's a good way to go.
PVFS
Fast, light-weight, not POSIX-compatible. If your apps don't need the stuff it doesn't do, or you're willing to write some glue code for your app to speak PVFS natively instead of using the FS driver, this is a great way to go. Looks simple to set up (as simple as these things get).

Re:Clustering filesystems- an overview (1)

Motherfucking Shit (636021) | more than 9 years ago | (#12448542)

Here's the Cliffs-notes version:
Nah, it can't be... There weren't any typos, and I haven't seen it posted before!

Re:Clustering filesystems- an overview (0)

Anonymous Coward | more than 9 years ago | (#12464817)

PVFS also has no redundancy built-in.

Converting extra Windows(tm) workstation space? (4, Interesting)

Dr.Dubious DDQ (11968) | more than 9 years ago | (#12443445)

A barely-related subject - I've been wondering whether there's some way to collect the unused space on all the Windows workstations around here into a shared space for storage.

This is purely a speculative exercise, but I keep wondering if some combination of:

  • Every Windows(tm) workstation "shares" an otherwise-empty subdirectory
  • a Linux box creates and uses a "filesystem image" file of some kind ("loopback mount"-style image) stored on each share over SMB/CIFS
  • Linux uses VFS to combine the individual virtual drives into a larger drive (or perhaps two identical-size virtual drives, which are then combined into a single software RAID 1 array?)
  • Linux then shares this Rube-Goldbergian system as a Samba share...

Yes, I know it's kind of silly, and performance seems like it would be pretty pathetic, but the more I think about it, the more I want to see if I could actually do it (think pretty much the same mindset that the IP-over-carrier-pigeon guys had...)

Heck, it might conceivably actually WORK for a large-but-infrequently-accessed historical repository or something...

Or has someone already started some sort of "Virtual ATA-over-ethernet-from-a-file driver for Windows" project and spoiled my fun?...

Re:Converting extra Windows(tm) workstation space? (2, Interesting)

egarland (120202) | more than 9 years ago | (#12443896)

If you want to try building it, I'd suggest you start with a nice high level method of creating linux based filesytems:

http://perlfs.sourceforge.net/ [sourceforge.net]

Build it first, optimize later.

FYI.. The multi-threaded filesystem version exists, I just haven't bundled it up pretty for distribution. Now someone needs to create a multi-threaded samba to share it out.

Re:Converting extra Windows(tm) workstation space? (1)

Dr.Dubious DDQ (11968) | more than 9 years ago | (#12443976)

That's actually the part I'm not sure about - I know I could e.g. format an old 6GB HDD and then use dd to make a filesystem image that I could mount, but I haven't done any digging to find out if it's possible to directly create a ('standard') filesystem as an image file. (Hints welcome...)

Perlfs looks interesting but it appears as though it hasn't been updated in a while (the homepage talks about adding support for linux "2.5" at some point...)

Re:Converting extra Windows(tm) workstation space? (2, Informative)

LuckyStarr (12445) | more than 9 years ago | (#12444227)

[...], but I haven't done any digging to find out if it's possible to directly create a ('standard') filesystem as an image file. (Hints welcome...)

Huh? Just run mkfs.whatever on your file. Should work without problems. Your filesystem is as large as it would be on an equally large blockdevice.

Example:

$ mkfs.ext3 file
mke2fs 1.36 (05-Feb-2005)
file is not a block special device.
Proceed anyway? (y,n) y

Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
1784 inodes, 7116 blocks
355 blocks (4.99%) reserved for the super user
First data block=1
Maximum filesystem blocks=7340032
1 block group
8192 blocks per group, 8192 fragments per group
1784 inodes per group

Writing inode tables: done
Creating journal (1024 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 38 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.

Re:Converting extra Windows(tm) workstation space? (1)

Dr.Dubious DDQ (11968) | more than 9 years ago | (#12444542)

Thanks - that saves my lazy butt from having to actually look at the man pages or whatever.

Now I have no excuse not to try it...

Re:Converting extra Windows(tm) workstation space? (1)

egarland (120202) | more than 9 years ago | (#12448401)

Perlfs looks interesting but it appears as though it hasn't been updated in a while

Sorry about that. I'm busy. You did get me motivated to put up the new multi-threaded capable version of perlfs today though. Maybe I'll even edit the home page at some point.

And yes.. it's somewhat actively maintained. No, I haven't worked on it with the 2.6 kernels yet. :)

Re:Converting extra Windows(tm) workstation space? (1)

itzdandy (183397) | more than 9 years ago | (#12444349)

first off all, iSCSI not samba

starwind will allow you to export images or even drives and partitions to iSCSI. WAY less overhead and way faster than smb. I can only get about 6MB/sec over 100 speed ethernet and SMB, but i can get ~10.5MB/sec with iSCSI.

simply add a number of these disk images accross your network and then mount them on your linux system. you can use LVM to manage these volumes also.

i have to warn your that 100 speed ethernet is a bit slow for data storage on a network especially when you look at it like:

comp A wants data in /share/vol1 on the server X
1)A requests data from X.
2)X requests that data from B where vol1 is stored
3)B sends data to X
4)X sends data to A
where a normal file server is just A->X->A or 2 steps over ethernet, its now A->X->B->X->A or 4 steps, doubleing the load on the network.

We use OpenAFS (4, Insightful)

Bamfarooni (147312) | more than 9 years ago | (#12443446)

We have about 27TB of data from Mars (and adding another TB per month) that we need to keep online. We have been using netapps, but at ~$25K/TB, plus maintenance (3 years maintenance is about as much as a whole new system) they're just WAY too expensive for data warehousing.

We've moved to using linux based OpenAFS servers. A high quality 3U box (qsol.com [qsol.com] ) loaded with 16x 300GB ATA drives costs about $8.5K and provides us about 3.5TB (2 drives for parity, 2 drives for hot-swap). That works out to $2.5K/TB. If your risk tolerance is higher than mine, you can bring that up to $8K/5.5TB, for about $1.5K/TB). We really want 99.999% availability, so just to be safe, we keep a 100% redundent read-only copy on a second machine (AFS supports this beautifully, including automatic fail-over).

OpenAFS has a couple of features that make it better than NFS (client-side cache, for instance), but it also has a few drawbacks, like no files >2GB.

Re:We use OpenAFS (2, Informative)

luizd (716122) | more than 9 years ago | (#12443982)

Not anymore in OpenAFS 1.3.81.

Copied from release notes:

For UNIX, 1.3.81 is the latest version in the 1.4 release cycle. Starting
in 1.3.70, platforms with pthreads support provide a volserver which like
the fileserver and butc backup system uses pthreads. Solaris versions 8
and above, AIX, IRIX, OpenBSD, Darwin, MacOS and Linux clients support
large (>2gb) files, and provided fileservers have this option enabled.
HP-UX may also support large files, but has not yet been verified. We hope
sites which can do so will make use of 1.3.81 on their UNIX platforms and
provide feedback to help us fix any remaining issues before 1.4 is
released.

Re:We use OpenAFS (1)

Triumph The Insult C (586706) | more than 9 years ago | (#12445894)

heh. you must be Z, E, or C

hello from M over in PSF =)

Re:We use OpenAFS (1)

Triumph The Insult C (586706) | more than 9 years ago | (#12445949)

nevermind. found out you are neither of the three <G>

i was talking to Z at the installfest 2 weekends ago and he told me about this migration. it sounded pretty cool. we have been eying a similar migration too (primarily for authentication, not storage), but we're waaaaaay short on resources here =(

Is data partitionalbe? (1)

davidwr (791652) | more than 9 years ago | (#12443521)

If your data is partitionable into small-enough discrete units that have low or not inter-unit dependencies, then it should scale almost without limit.

After all, as a collection there is an immense amount of data on "the world wide web" but since it's partitioned, scaling isn't an issue.

Even before the web, the universe of ftp, gopher, news, and other servers held gobs and gobs of data, nicely partitioned.

When answering questions like this, it would help to know the organization of the data and if it can be easily reorganized, or at least viewed from a different "angle" that may offer other solutions you may not have thought of yet.

The IBRIX file system is a strong runner for this. (2, Informative)

schnook (561498) | more than 9 years ago | (#12443725)

Check out http://www.ibrix.com/ [ibrix.com] This is a perfect solution for your requirements. Pixar uses this.

Translation: (0)

Anonymous Coward | more than 9 years ago | (#12443936)

Our data is important enought that all of it needs 24x365 access, but we're too cheap to pay for hardware to support that.

aRchive.org (2, Informative)

fulldecent (598482) | more than 9 years ago | (#12444052)

This is the solution archive.org uses.

http://www.archive.org/web/petabox.php [archive.org]

They are on the order of petabytes

Get a copy of this month's LinuxJournal (1)

toddbu (748790) | more than 9 years ago | (#12447861)

There's a great article on ATA over Ethernet (AoE) and it has a story about a guy who put 2TB of RAID 10 up for $6,500. It looks like a fascinating solution for storing large volumes of data. If your data is primarily static, a couple of these machines replicating between themselves and you're good to go.

Avoid Filesystems if you need scalability (1)

macz (797860) | more than 9 years ago | (#12457884)

"We've got a _lot_ of data we'd like to archive... We've been using a NetApp for this, but that solution is just waaaay to expensive to scale. "

The theoretical upper limit of any file system is limited by 2 things, the address space, and the efficiency of the data structure.

In a 32 bit system, that means that, in theory you could fit 4.2 billion objects into a file system... but don't try it. NTFS craps out at between 15 and 50 million depending on whose numbers you are willing to listen to, EXT3 starts to lose performance between 50 and 100 million objects (inodes).

The worst thing you can do then is compound the problem by adding filesystems to get capacity. The management of such a complex system becomes untenable without serious automation.

Try looking at content addressed storage (CAS) at http://www.cascommunity.org/ [cascommunity.org] see if that approach isn't more scalable. Basically CAS abstracts the object from the filesystem by addressing it using a self-referrential and unique (usually hashed with MD5 and/or SHA256), "valet ticket." Present the Content Address to the system, it gives you your data.

silly worker drones... what about CXFS?!!!! (1)

naisho8 (876214) | more than 9 years ago | (#12458543)

um, the stats speak for themselves ~ 64-bit scalability to support files to 9 million Tb, filesystems to 18 million Tb ~ Instant data sharing without network mounts or data copies among all major OS: IRIX® Sun(TM) Solaris(TM) IBM® AIX® Windows® (2000/XP) Linux® (32 and 64 bit) Mac OS® X Unix® Flavors # Highly optimized distributed buffering techniques that provide the industry's fastest performance # High availability with automatic failure detection and recovery # Centralized, intuitive Java(TM) language-based management tools # POSIX® compliance that requires no application change So if you really do have large volumesof data and want 99.999% uptime, look at this and tremble in awe! http://www.sgi.com/pdfs/2508.pdf [sgi.com]

Re:silly worker drones... what about CXFS?!!!! (1)

macz (797860) | more than 9 years ago | (#12462232)

Cool FS. Questions: Where do the 5 9's of reliability claims come from? Major OS support appears to be through Samba so doesn't that become a bottleneck in terms of scalability?
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>