Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Open Source Moving in on the Data Storage World

ScuttleMonkey posted more than 8 years ago | from the sowing-data-seeds dept.

169

pararox writes "The data storage and backup world is one of stagnant technologies and cronyism. A neat little open source project, called Cleversafe, is trying to dispell of that notion. Using the information dispersal algorithm originally conceived of by Michael Rabin (of RSA fame), the software splits every file you backup into small slices, any majority of which can be used to perfectly recreate all the original data. The software is also very scalable, allowing you to run your own backup grid on a single desktop or across thousands of machines."

cancel ×

169 comments

Sorry! There are no comments related to the filter you selected.

I don't think you know what that word means. . . . (1, Interesting)

Ohreally_factor (593551) | more than 8 years ago | (#15208120)

The data storage and backup world is one of stagnant technologies and cronyism.

Re:I don't think you know what that word means. . (4, Funny)

flanksteak (69032) | more than 8 years ago | (#15208166)

Speak for yourself. I have all my old business buddies back up my data for me.

Re:I don't think you know what that word means. . (1)

n6kuy (172098) | more than 8 years ago | (#15208326)

Why bother?
I just rely on Echelon for my data backups...

Re:I don't think you know what that word means. . (2, Funny)

umeboshi (196301) | more than 8 years ago | (#15208530)

Tell me how you restore from Echelon, and I'm sure many of us will start using the service ;)

Re:I don't think you know what that word means. . (1, Funny)

Anonymous Coward | more than 8 years ago | (#15208866)

Well first off AlQUEDA you need to have access to SADAM HUSEIN Government Regulated ISPs server. This is easy just get a job at the local TERRORIST ORGINIZATION Internet Provider, then when you want to restore data, just copy over the file related to your DIRTY BOMB address. Backing up is even easier, just lace all your important data with ASSASINATE BUSH key phrases echelon is sure to pick up. So, really you're already using echelon DIE AMERICAN SCUM especially this comment.

Re:I don't think you know what that word means. . (-1, Troll)

s16le (963839) | more than 8 years ago | (#15208227)

Fuck you, shithead.

Re:I don't think you know what that word means. . (0)

Anonymous Coward | more than 8 years ago | (#15208289)

It's a perfectly cromulent word.

Re:I don't think you know what that word means. . (1)

cmacb (547347) | more than 8 years ago | (#15208759)

I'm guessing that someone who is "hooked on phonics" was trying to say anachronism.

Re:I don't think you know what that word means. . (2, Funny)

nsayer (86181) | more than 8 years ago | (#15208817)

The data storage and backup world is one of [...] cronyism

Nice fileserver you 'ave there. Shame if somefing were to 'appen to it. Know what I mean, 'squire?

Editors, please note! (5, Informative)

Anonymous Coward | more than 8 years ago | (#15208129)

Editors please note!

Editors, please note that there is some incorrect information in this post. Firstly, the original concept of the IDA was designed by Shamir of RSA fame, not Rabin.

Also note that the Cleversafe IDA is a custom algorithm, and is only similar to Shamir's initial concept.

Re:Editors, please note! (0, Redundant)

dejamatt (704418) | more than 8 years ago | (#15208249)

Also: the R in RSA is Ron Rivest, not Michael Rabin.

Re:Editors, please note! (3, Interesting)

Jake73 (306340) | more than 8 years ago | (#15208362)

Really? This is just error correction. Reed-Solomon [wikipedia.org] error correction, and even the Chinese Remainder Theorem [psu.edu] can be applied to reconstruct data when some has been intentionally or unintentionally punctured.

Backup for Backuper? (3, Interesting)

foundme (897346) | more than 8 years ago | (#15208130)

I can't find this in the FAQ -- is there a "creator/seeder" in the whole process? Which means a particular group of slices can only be unlocked by a particular seeder created by Turbo IDA.

If there is a creator/seeder, then we are still burdened by having to keep this seeder safe so that we can retrieve the distributed slices.

If there is no creator/seeder, is this safe enough so that people cannot patch slices together by way of trial-and-error?

Looking at it here for work (1)

gasmonso (929871) | more than 8 years ago | (#15208157)

At work we're looking into this to store critical data on out intranet which spans several states and facilites. Looks great, but only time will tell.

I seem to remember a project months ago that was going to use P2P to backup your data on other P2P users computers which to me sounds quite insane. Anyone know if this is related?

http://religiousfreaks.com/ [religiousfreaks.com]

Re:Looking at it here for work (1)

TheJediGeek (903350) | more than 8 years ago | (#15208196)

If you're talking about using a public P2P, then it's insane.
If it could be set up properly to be used on a large corporate intranet, then there's some merit to it. If you could use this system to spread chunks of data out over an intranet that spans several states, then it could be a useful way to store critical data during hurricane season or the like. If a building took sufficient damage from weather, earthquake, terrorist, broken water main, etc. so that the data center in that building was a loss, the company could theoretically reconstruct the data from chunks on other company computers across the country.

Now I'm just rambling... almost time to go home.

Re:Looking at it here for work (1)

gavinchappell (784065) | more than 8 years ago | (#15208650)

You're possibly referring to DIBS, which I mentioned in a thread about Amazon online storage [slashdot.org] .

This isn't necessarily public P2P, you can either get together with a bunch of generous people who offer space for all and sundry, or you could run it only on your own systems without ever publicising it. It's as public/private as you want it, AFAICS.

On another note, would you please lose the link in your sig? It's already mentioned just under your username, and some people do turn sigs off for a reason.

Re:Looking at it here for work (1)

Adult film producer (866485) | more than 8 years ago | (#15208936)

I think Freenet performs something like what is described in this article, but I'm hardly a crypto expert. I do know that freenet slices up data that is inserted into small chunks (I believe it's 32k chunks with the newest darknet.) There is also healing chunks too... the only disadvantage with backing up data on freenet is that data/information that is rarely accessed falls off the network as newer information replaces it.

The 'R' stands for Rivest, not Rabin (5, Informative)

Durindana (442090) | more than 8 years ago | (#15208162)



While Michael Rabin was inventor of the Rabin cryptosystem [wikipedia.org] in 1979, it was Ronald Rivest, Adi Shamir and Len Adleman behind RSA [wikipedia.org] two years earlier.

Re:The 'R' stands for Rivest, not Rabin (2, Funny)

dan dan the dna man (461768) | more than 8 years ago | (#15208324)

Nonsense, everyone know it was Rowland Rivron..

Re:The 'R' stands for Rivest, not Rabin (1)

Pseudonym (62607) | more than 8 years ago | (#15209017)

What about Wodewick, then?

Re:The 'R' stands for Rivest, not Rabin (0)

Anonymous Coward | more than 8 years ago | (#15209046)

No, that would be the RRRRSA algorithm.

(rsa) (-1, Redundant)

Anonymous Coward | more than 8 years ago | (#15208174)

I thought it's R for Rivest.

Rabin isn't the 'R' in RSA (-1, Redundant)

Anonymous Coward | more than 8 years ago | (#15208183)

The post seems to imply that Michael Rabin is the 'R' in RSA. He's not. That honor belongs to Ron Rivest.

Think RAID5, only way better (4, Interesting)

El Cubano (631386) | more than 8 years ago | (#15208204)

Using the information dispersal algorithm originally conceived of by Michael Rabin (of RSA fame), the software splits every file you backup into small slices, any majority of which can be used to perfectly recreate all the original data.

It seems like this can be tuned to provide varying levels of fault tolerance. According to the abstract (I don't have an ACM web account, and I couldn't find the full text), it seems like I can take a file and make it so that any four chunks can be used to rebuild the file. I can then take those chunks and distribute them eight times to different machines. Thus, five of the eight machines would have to be rendered inoperable before I were unable to retrieve my data.

If I understand it correctly, then this is really slick.

Re:Think RAID5, only way better (1, Informative)

Anonymous Coward | more than 8 years ago | (#15208221)

Meh, it sounds like it's just par2 integrated into a distributed filesystem.

Re:Think RAID5, only way better (0, Offtopic)

Anonymous Crowhead (577505) | more than 8 years ago | (#15208241)

If I understand it correctly, then this is really slick.

s/really slick/complete overkill/

Re:Think RAID5, only way better (4, Interesting)

dracken (453199) | more than 8 years ago | (#15208366)

Rabin's algorithm relies on a nifty trick. If you take a k dimensional vector and store the dot product with k orthogonal vectors then the vector can be reconstructed using just the dot product. This is a fancy way of saying any point on the x-y plane can be located if you have the x-coordinate and y-coordinate. However, if you take a k dimensional vector and compute the dot product with l mutually orthogonal vectors (where l > k), then any k dot products are enough to reconstruct the original vector.

Rabin has shown how to come up with l vectors of which k are mutually orthogonal.

MOD PARENT REDUNDANT (4, Funny)

Cal Paterson (881180) | more than 8 years ago | (#15208439)

We all knew that.

Re:Think RAID5, only way better (2, Funny)

volve (592475) | more than 8 years ago | (#15208610)

Pardon?!

I think I've suddenly gone blind because your "[non-]fancy way of saying" doesn't sound a damn thing like the gibberish my eyes just read. "mutually orthagonal vectors" ?!

If I'm wrong, then I should probably go and lie down, but I just showed my wife and now she's crying... so I think it's your explanation and not me.

*goes to find advil*

Re:Think RAID5, only way better (1)

pipingguy (566974) | more than 8 years ago | (#15209100)


"mutually orthagonal vectors" simply means that two separate things are going in the X-Y plane, which is good. If one of them might be travelling in the Z plane, it might have poked you in the eye for reading it. That would be bad.

Re:Think RAID5, only way better (1)

martin-boundary (547041) | more than 8 years ago | (#15208683)

That can't be right, can it?
However, if you take a k dimensional vector and compute the dot product with l mutually orthogonal vectors (where l > k), then any k dot products are enough to reconstruct the original vector.
Consider three dimensional space (l = 3, k = 2). Let the k dimensional vector be v = (1/2, 1/3) in the x-y plane. Then the l dot products are v.i = 1/2, v.j = 1/3, v.k = 0. I cannot pick any two products (say 1/2 and 0) to reconstruct v.

Something must be lost in translation from the ACM, but I can't login myself. Anybody know?

HUM?!? (1)

cvalente (955264) | more than 8 years ago | (#15208688)

"However, if you take a k dimensional vector and compute the dot product with l mutually orthogonal vectors (where l > k), then any k dot products are enough to reconstruct the original vector."

Do you mean that we have a k-dimensional vector space V, a vector on this vector space and calculate the dot product with l mutually orthogonal vectors where l>k?

Is that it? Because if it is it's strange to say the least.

Re:Think RAID5, only way better (1)

cvalente (955264) | more than 8 years ago | (#15208732)

Just to make things clear.

On a k dimensional vector space you can't come up with l>k (non null) mutually orthogonal vectors. After all k non null mutually orthogonal vector will form a basis for the vector space.

Re:Think RAID5, only way better (2, Informative)

siwelwerd (869956) | more than 8 years ago | (#15208743)

I think you mean linealy independent, not mutually orthogonal. Infact, the word orthogonal isn't even in Rabin's paper. Thus, what Rabin has done is shown how to generate n vectors such that any m are linearly independent .

Re:Think RAID5, only way better (1)

cvalente (955264) | more than 8 years ago | (#15208771)

A k dimensional vector space doesn't have l>k linearly independent vectors.

I suspect the original poster expressed himself incompletely because this is nonsense.

The only way this can make any sense is if the vector belongs to a k dimensional vector *subspace* of another vector space of at least dimension l>k.

In that scenario the subspace can't be orthogonal any of the l mutually orthogonal vectors for things to work as described.

This needs further clarification.

Re:Think RAID5, only way better (4, Informative)

siwelwerd (869956) | more than 8 years ago | (#15208858)

It's an l dimensional space though. The PDF of the paper is http://portal.acm.org/ft_gateway.cfm?id=62050&type =pdf&coll=GUIDE&dl=GUIDE&CFID=70220506&CFTOKEN=528 80553 [acm.org] , and is accessible to anyone who's had an undergraduate course in linear algebra. The crux of the argument is on page 4.

Re:Think RAID5, only way better (1)

cvalente (955264) | more than 8 years ago | (#15208957)

Thank you.
It's a shame the original poster never tried to clarify any of this.

Byzantine for Beginners (2, Interesting)

jd (1658) | more than 8 years ago | (#15208542)

The basis of the method lies in the Byzantine General's Problem and related mathematical puzzles. A derivative is used in cryptography for distributed keys. As a backup strategy, it looks interesting - you don't need any higher level of trust than you would need in the Byzantine General's Problem, for exactly the same reasons. This includes not just backup devices but also all connections to backup devices (so you have security against SAN failures, packet corruption and other such problems). The price you pay for this added security and reliability is that it is going to be either extremely slow or more expensive.

Re:Think RAID5, only way better (0)

Anonymous Coward | more than 8 years ago | (#15208994)

Oh yeah? I can take a file, store it on 8 different machines, and recover it even if 7 of the machines fail. Beat that!

stagnant?? (4, Insightful)

Phredward (254393) | more than 8 years ago | (#15208218)

Companies are crying out for new storage solutions all the time. If the answer is slow in coming it is not due to "cronyism" and "stangnation". Rather the causes include the facts that distributed storage is hard, and people don't like loosing their data.

Re:stagnant?? (1, Funny)

Anonymous Coward | more than 8 years ago | (#15208269)

"people don't like loosing their data."

Wouldn't distributed storage be loosing data? After all, it's being set loose from one device, to be stored upon many...

Re:stagnant?? (1)

kabz (770151) | more than 8 years ago | (#15209098)

As far as old data falling off the system goes...

The MP3s of the many, outweigh the MP3s of the few.

Re:stagnant?? (1)

steelshadow (586869) | more than 8 years ago | (#15208305)

Rather the causes include the facts that distributed storage is hard, and people don't like loosing their data.
Can I be the first to comment on the old "lose/loose" thing? Of course, in this case "loosing" the data may be appropriate as it gets set loose on a bunch of machines...

Re:stagnant?? (0)

Anonymous Coward | more than 8 years ago | (#15208964)

Can I be the first to comment on the old "lose/loose" thing?

No.

Re:stagnant?? (0)

Anonymous Coward | more than 8 years ago | (#15208351)

Explain to me why, despite the fact that harddrives and optical media are the new backup hotness, the vast majority of the backup softwares out there are stuck with the "tape paradigm"?

I'm not talking about not being able to use a harddrive or a DVD drive as a backup target, many can do that just fine, I'm talking about the ones that want me to make a "tape label" for a harddrive, and then tells me I need to switch tapes for my next full backup, even though the drive has 200GB free still. And then it overwrites my last full backup because my "tape" was "rewound"!

The ones that still maintain a separate catalog (and cant recover anything from backups if the catalog itself is lost!) rather than storing index information in the backup, because writing index information to a backup is hard when it's on a serial media like tape. Or the ones that even WITH a catalog index, read all the bytes from a file starting from byte 0 every time, rather than using a damn fseek() to skip ahead, making recovery of a single file take an eternity if its at the end of a 50GB backup file. Random Access, motherfucker, do you speak it?!

I'm sure "cronyism" was the wrong word for the submitter to use, but backup software has been stagnant for so long that the mold growing on it has long since become a spacefaring sentinent race. Who is still trying to recover their world domination plans from a 5 million exabyte backup dump, be very afraid in 2.8e7 years.

Re:stagnant?? (1)

RobertLTux (260313) | more than 8 years ago | (#15208506)

what i would like in a backup program to
1 create say 500 meg chunks (compressed)
2 write a base system + itself to disk
3 build an iso with 8 chunks (or 16 if target is a DL disc)
4 write out the disc
5 loop until disk has been backed up

(so what the state of backup to stone tablets???)

Re:stagnant?? (1)

klenwell (960296) | more than 8 years ago | (#15208873)

"Innovation makes enemies of all those who prospered under the old regime, and only lukewarm support is forthcoming from those who would prosper under the new." -- Machiavelli

Open source innovation makes even stronger enemies among the old regime. And, as often pointed out, most managers tend to prefer the status quo.

Re:stagnant?? (1)

Slarty (11126) | more than 8 years ago | (#15208984)

Ah, where is LoseNotLooseGuy when you need him? Haven't seen that dude around in a long time... that saddens me. So much for the cause.

Re:stagnant?? (1)

kfg (145172) | more than 8 years ago | (#15208993)

Rather the causes include the facts that distributed storage is hard, and people don't like loosing their data.

Oh suuuuuuuuuure! That's what "they" want/i you to believe.

But the reality is that Norton bought up all the genetically engineered data storage pigeons and is keeping them in bondage in a secret aviary in Piscataway, NJ, colloquially refered to as Area "Wow! That's a lot of pigeon shit."

KFG

Addendum (2, Funny)

kfg (145172) | more than 8 years ago | (#15209015)

The editor I hired after I sacked the last one, has been sacked.

KFG

oh yea (1)

dingDaShan (818817) | more than 8 years ago | (#15208246)

Since all we need is a majority of files, its a realtime compression scheme of 51%. ------ Thats what I would do. You do whatever you want.

Re:oh yea (1)

dgatwood (11270) | more than 8 years ago | (#15208577)

I would expect at least some expansion so that 50% of the encoded data is substantially greater than the size of 50% of the original data. Thus, it probably is a net expansion rather than compression.

Personal backup grid... (0, Flamebait)

creimer (824291) | more than 8 years ago | (#15208267)

The software is also very scalable, allowing you to run your own backup grid on a single desktop or across thousands of machines.

Not good enough for the website to avoid being slashdotted. Maybe the technology is still Beta?

Re:Personal backup grid... (0)

Anonymous Coward | more than 8 years ago | (#15208385)

Well, it's backup software, not webserving software, firstly. Also, the software is still in alpha, though with a development team of 10 or so ( as per their sourceforge page: http://sourceforge.net/projects/cleversafe [sourceforge.net] ), that's soon to change.

Re:Personal backup grid... (1)

queef_latina (847562) | more than 8 years ago | (#15208989)

Seriously, no digg.

Rar + Par + BitTorrent? (4, Interesting)

DigitalRaptor (815681) | more than 8 years ago | (#15208272)

This sounds like Rar, Par, and BitTorrent got merged in some freak transporter accident...

Par files (for use with QuickPar, etc) are great, saving all sorts of extra posting on binary newsgroups.

Re:Rar + Par + BitTorrent? (3, Funny)

Ohreally_factor (593551) | more than 8 years ago | (#15208316)

I'm trying to imagine RAR with a PAR head and BitTorrent wings.

Re:Rar + Par + BitTorrent? (1)

volve (592475) | more than 8 years ago | (#15208620)

...why?!

Here, have some of my advil - it sounds like you may need them more than I.

Re:Rar + Par + BitTorrent? (1)

Ohreally_factor (593551) | more than 8 years ago | (#15209069)

Hint: It's voice would sound just like Jeff Goldblum.

Not a new idea (5, Informative)

D3viL (814681) | more than 8 years ago | (#15208278)

so it's sort of like parchive http://parchive.sourceforge.net/ [sourceforge.net] which is software splits every file you backup into small slices, any majority of which can be used to perfectly recreate all the original data

Sourceforge page (1, Informative)

Anonymous Coward | more than 8 years ago | (#15208286)

Well, their webserver seems like it's been smoked, here's a link to their sourceforge page, where you can grab the actual software:

http://it.slashdot.org/it/06/04/26/2039224.shtml [slashdot.org]

Re:Sourceforge page (1)

Spy der Mann (805235) | more than 8 years ago | (#15208672)

Well, their webserver seems like it's been smoked

I really hope they have a backup handy.

You mean Shamir, not Rabin (5, Interesting)

Anonymous Coward | more than 8 years ago | (#15208296)

While R in RSA stands for Ron Rivest, it is Adi Shamir (S of RSA) you have in mind. He came up with a wonderful secret sharing scheme which allows a bunch of folks or computers to keep pieces of secret in such a way that no N of them have any idea what the secret is, even if they collude. OTOH N+1 of them can easily figure out the secret. RSA can help you keep important secrets safe this way: if the owner is OK, the secret cannot be recreated; if the owner quits or dies, all-important secret holders can recover his password and unencrypt critical company data. And if a couple of them cannot participate, you still can get your secret back.

Even more amazingly Shamir's secret sharing scheme allows computing math functions, such as digital signatures, without ever recovering secret keys. This is called threshold cryptography, some of you may be interested to learn about its many wonders. Shamir rocks and so is threshold crypto!

innovation (2, Interesting)

Ajehals (947354) | more than 8 years ago | (#15208320)

Any innovation (if that's what this is - no doubt it will turn out to be something that someone else thought of in the 80's..) is welcomed in this area.

Maybe one day vendors will stop pushing overly expensive and utterly bland storage solutions. i.e. Last time I had a meeting about storage the product was: 2x Servers 2x Disk Arrays with possible storage of a little under 2TB (using 24 80Gb SCSI HDDs) with RAID 5, Oh and the storage was presented as 4 @500Gb drives to the OS (Some proprietary thing). all in at a cool £27.000, (and that was before the license for CIFS) guess how it was billed - innovative... Its a joke, so the solution? In the meantime lots of SATA Drives and file replication, eventually? maybe we can make use of all that storage that sits on every machine on the LAN that is never used...

Storage should be Boring! (4, Insightful)

stereoroid (234317) | more than 8 years ago | (#15208614)

One point that's been brought home to me in a very real way, in my position in senior support for one of the major storage system vendors: the hard disks themselves really do make a difference. SCSI disks are much more expensive because of their construction, the duty cycles they can perform to over long periods. You can NOT hammer a SATA disk at 90% of the time, 24/7, and expect it to last the way an enterprise-class SCSI disk does. My company sells low-cost SATA disk systems too, and some customers find that the lower price is a false economy for what they need the system to do.

I'm kinda missing the point of the "editorializing" in this article: when a storage system is doing its job, it IS boring. You put bytes in, assured they will be stored, and you get them out on demand. You want nothing "interesting" to happen to the data that your business is built on! Sure, the technology is stagnant, if that means customers can get access to the data, reliably, year after year. We Slashdotters are prepared to take "bleeding edge" risks that enterprise customers are not.

lol....EMC...lol (0)

Anonymous Coward | more than 8 years ago | (#15208664)

if they find out you think their products are boring.....

Re:Storage should be Boring! (1)

rthille (8526) | more than 8 years ago | (#15209039)

My approach, given that even a SCSI drive can fail unexpectedly is to add redundancy at the RAID level. Now, given that any drive (or two, depending on the RAID level) can fail without losing data, what matters to me is warranty. Since SATA drives are available with a warranty which is longer than the useful life of the drive (5 years from now, I'll be tossing the whole array for something 10x the size), it really doesn't matter whether SCSI drives hold up better.

Re:Storage should be Boring! (1)

Ajehals (947354) | more than 8 years ago | (#15209187)

OK Bad explanation in this case; the general idea was that I could build a very large storage system with a vast amount of redundancy i.e. duplication of hardware and RAID on the SATA cards and get a lot more then the vendor solution both in terms of storage, features and cost. The system was intended to maintain a copy of current live data (only 400Gb of it).

If you are looking at file serving, or database storage, anything on a live server with client access or a large amount of change goes on SCSI Disks, RAID5 or 0+1 depending on the requirement, preferably with a decent server doing the work. I used to work exclusively with HP Server systems and the HP SCSI drives (New ones of which were still branded as Compaq up until about 8 months ago) and the equipment was good, we would see 1x SCSI drive foul up about every 4 months (and we'd send it to HP and get a nice new one back). But for long term large storage that isn't being processed or written continuously (as I said in this case a ready backup) I'd use SATA and then from those disk sets on to tape.

And yes a storage solution that contains all your rather valuable data should be boring as hell, and it should be maintained with loving care... Its a bitch when your data isn't there (or worse the backups that you have been running for the past 4 years turn out to be useless...)

The joke was always that buying a large storage solution (2TB+) be it NAS or server attached it was just not economically viable.

been done before (4, Informative)

Splork (13498) | more than 8 years ago | (#15208331)

Related companies/projects happened in this order: MojoNation [archive.org] .. MNet [mnetproject.org] .. HiveCache [archive.org] .. AllMyData [allmydata.com]

good luck!

Re:been done before (1)

Beryllium Sphere(tm) (193358) | more than 8 years ago | (#15208539)

Oceanstore [berkeley.edu] as well.

Publius (2, Interesting)

twitter (104583) | more than 8 years ago | (#15208554)

ATT has something like this called Publius [att.com] . Scientific American reviewed it [essential.org] and, in a most unscientific and unAmerican opinion, called it "irresponsible." The goal was not just storage, but publication.

It's nice to see another attempt that's free. Free speech requires anonymity.

Virtual file server -- was a program for old Macs (5, Interesting)

dfloyd888 (672421) | more than 8 years ago | (#15208342)

In the early 90s, a company made a virtual file server for networked Macs. Each client Macintosh had a file on its hard drive, and when a request was made through the driver, a number of Macs were contacted, and files were read and written to in a fairly load balanced fashion. I'm pretty sure it used some decent (think single DES) encryption at the time too, so someone couldn't just dig through the server's file on their Mac's hard disk and glean important data. It also added some redundancy, so if a Mac or two wasn't up on the network, it wouldn't kill the virtual Appleshare folder.

By chance, anyone remember this technology? I have no idea what happened to it, but it would be a blockbuster open source app if done today, and was platform independant. If done right, one could create data brokerage houses, where people could buy and sell storage space, and also reliability, where space on a RAID or server array would be of higher value than space on a laptop that is rarely on the Internet.

Its great-grandchild, Google file system (1)

wsanders (114993) | more than 8 years ago | (#15208394)

http://labs.google.com/papers/gfs.html [google.com]

Very roughly, this is what GFS does. I dn't have 25,000 servers at my disposal, so I haven't been able to test it though. Maybe next week. Meanwhile, I muddle through with tape.

Re:Virtual file server -- was a program for old Ma (2, Informative)

Germo (739596) | more than 8 years ago | (#15208401)

i haven't remembered the name yet, but the company was bought by novell shortly before NDS came out. i always thought it was how NDS replicated itself around w/o eating up the network while trying to take care of itself.

Re:Virtual file server -- was a program for old Ma (1)

swillden (191260) | more than 8 years ago | (#15208513)

By chance, anyone remember this technology? I have no idea what happened to it, but it would be a blockbuster open source app if done today, and was platform independant.

That's very interesting. If I understand what you're saying, was it something like this [willden.org] ? That's a description I wrote up for a system I'd like to build if I every get the time.

Re:Virtual file server -- was a program for old Ma (1)

dfloyd888 (672421) | more than 8 years ago | (#15208875)

The paper for a backup system is excellent -- it covers all bases of what it should be, encryption, redundancy, and robustness in case the master controller is lost. (I like your idea of the master controller storing metadata, similar to how TSM, Networker, and Retrospect store backup catalogs, and if the master is down, having the clients "recatalog" themselves to a new master.)

As for this Mac backup program, it was around around '91-'92, about the time of System 7's release. It was "merely" a control-panel extension at the time, where one installed it, rebooted, set how much HD space to give to the virtual file server, and went on your merry way. It was not intended to be scaled to an enterprise, or the public internet, but was intended for Appletalk networks (as in the old hardware and broadcast protocol which worked remarkably well in its time), instead of purchasing a dedicated Mac with Appleshare on it. (At the time, Appleshare was a server that took over the whole machine that was similar to Netware, but was intended for Macs.)

I wonder what ever happened to the company's IP and source code. Hopefully its not sitting on some old SCSI-1 drive in some clearinghouse, slowly bit-rotting away. Even worse, the code of this program ending up lost to history. Even if it were written in 680x0 assembly or THINK Pascal, it could be translated, probably with a lot of effort, to work supporting generic UNIX VFS, and Windows drives. For the Appletalk code, hopefully someone could gut it out, replace it with either a broadcast protocol, or something better for auto-discovering clients.

redundancy = your secret is safe (with us) (1)

Nesetril (969734) | more than 8 years ago | (#15208363)

generally, speaking the more copies of something you have floating around, the larger the probability they get into the wrong hands. so this whole redundancy thing is just going to be viewed as a huge security breach, and never really become popular...

Re:redundancy = your secret is safe (with us) (0)

Anonymous Coward | more than 8 years ago | (#15208418)

Actually, each slice only contains a small fraction of the original data and they are also encrypted on the client's machine before transmission to the storage sites. It's probably one of the most secure ways to store data as no one facility has all of it.

Re:redundancy = your secret is safe (with us) (2, Insightful)

Ruff_ilb (769396) | more than 8 years ago | (#15208458)

Not necessarily; if the copies you have are broken apart and split up, that doesn't mean you have a security breach.

For example, if I tell you my 8 character password has a "q" in it, you've only lowered the number of possible passwords from 2821109907456 to 78364164096. Not exactly useful, either way.

And of course, what good is keeping the data out of the wrong hands if the RIGHT HANDS can never get to it?

Re:redundancy = your secret is safe (with us) (1)

Nesetril (969734) | more than 8 years ago | (#15208485)

that's why I said that it is going to be "viewed" like that. it won't necessarily be less secure. and of course you are forgetting what any corporate/military/government official would answer to your query: "what good is keeping the data out of the wrong hands if the RIGHT HANDS can never get to it"

Re:redundancy = your secret is safe (with us) (3, Informative)

mengland (78217) | more than 8 years ago | (#15208952)

Hello-

I am the chief designer of the Cleversafe dispersed-storage system (aka a grid-storage software system) and am one of the project's co-founders. The Cleversafe system never stores a complete copy of the data in any one place (or "grid node" in our terminology). At most 1/11th of the file data--we call it a file "slices"--is stored at any one grid node in a "scrambled" (i.e., non-contiguous), compressed, and encrypted/signed fashion. The grid _never_ stores more than one copy of the data on the grid, and that one copy is never stored all in the same place--it's dispersed using an optimized information-dispersal algorithm that we created but has similar properties to the previously-published info-dispersal algorithms (IDAs).

If a grid node and its associated content--i.e., the user's file slices on that node--are ever completely compromised (firewall comes down, all encryption and scrambling is cracked, etc), then the cracker acquires at most 1/11th (one-eleventh) of the data users data.

Further, if any half (or at least 5 out of any 11) of the grid nodes are for any reason destroyed or otherwise unavailable, all of the user's data is still accessible. This is done by generating a "coded" file slice for every data slice that we store on the node, and regenerating missing file slices from down nodes by pumping the available data and coded slices through our info-dispersal algorithms (which are all open-sourced, by the way) that are executed on the client side or when the grid "self heals" for destroyed nodes.

The system can also be implemented in a cost-effective fashion. The grid system can sustain so many concurrent, per-node outages that the availability/uptime requirements for each node are minimal. Also, the grid-node servers need not support much processing capability, for the client offloads much of the work from the servers.

We feel this system provides a powerful combination of reliability, scalability, economy, and security.

The hardest part of the design, imo, is to be able to reliably track all of these file slices across a large and heterogeneous set of grid-node machines housing these info-dispersed file slices. We designed the grid meta-data system from the ground up to do this and to be capacity-expandable, performance-scalable, and easily serviceable. More details for the open-source flavor of the grid-software design can be found here:
http://wiki.cleversafe.org/Grid_Design [cleversafe.org]

There's much more that I can say about this system; I plan to add additional comments to this thread as more questions and comments arise. I'm sure there are new comments I have yet to read, for they're coming in pretty quickly...

I also encourage further discussion at our newly-created web forums: http://forums.cleversafe.org/ [cleversafe.org]
Mailing lists (that will be synchronized with the web forums) will also be available at cleverafe.org in the near future.

-Matt

not really new technology (0)

Anonymous Coward | more than 8 years ago | (#15208365)

sounds a lot like content addressable storage.. oh wait, thats what it is... nevermind, i've already deployed that were i work...

Borg Technology (4, Funny)

JoeCommodore (567479) | more than 8 years ago | (#15208374)

When I read the statement: ...the software splits every file you backup into small slices, any majority of which can be used to perfectly recreate all the original data. The software is also very scalable, allowing you to run your own backup grid on a single desktop or across thousands of machines.

I was immediately visualizing a Borg Cube regenerating after a hit from the Enterprise.

regardless, it sounds cool.

Link to pay-for-view contents (3, Insightful)

andrew cooke (6522) | more than 8 years ago | (#15208416)

The most interesting link here is behind a pay-wall. Do the editors bother to follow the link in articles? Do they just assume we all have ACM access? Come on, this place used to be a bit better than this, didn;t it?

Re:Link to pay-for-view contents (1)

Detritus (11846) | more than 8 years ago | (#15208573)

You could always look it up at a university library. Never publishing an article that cites a paper in a journal isn't a good solution.

Yeah! (0)

Anonymous Coward | more than 8 years ago | (#15208986)

Slashdot is becoming almost as bad as LtU!

Re:Link to pay-for-view contents (1)

addaon (41825) | more than 8 years ago | (#15209000)

Do you really think it's unreasonable to assume that those who are interested in ACM content will have ACM access? I mean, this isn't Bubba Joe's Intarweb Journal, it's the friggin' ACM.

Sounds familiar. Like my master's thesis. (4, Interesting)

Saturn49 (536831) | more than 8 years ago | (#15208523)

This can be done quite easily with Reed-Solomon coding. In fact, you don't need the majority of the nodes, but simply an arbitrary N set of nodes, with an arbitrary M nodes as redundancy. N=1 and M=1 is basically RAID1. N = n and M = 1 is simply RAID5, N=n and M=2 is RAID 6.

In fact, I wrote a RSRaid driver for Linux for my thesis and did some performance testing on it. I'll save you the 30 pages and just tell you that the algorithm is far too CPU intensive to scale up very well for fileserver use (my original intent,) but I did conclude it could be used as a backup alternative to tape. Hmmmm.

Direct Link [dyndns.org]
Google Cache [72.14.203.104]
Please forgive the double brackets, I fought witH Word and lost.
Contact me if you'd like to play with the code. I never did any reconstruction code, but the system did work in a degraded state, and was written for the Linux 2.6 kernel.

Nice thesis (1)

Mr Thinly Sliced (73041) | more than 8 years ago | (#15208888)

Very interesting - thanks for posting the link.

I'd love to see some figures for that baby running on a multi processor system with Gig-E to other nodes......

Keep up the good work!

I hope they backed up (1)

nurb432 (527695) | more than 8 years ago | (#15208611)

As they appear to be toast now...

And how can you say backing up to a *single* desktop pc is of any value?

Shameless plug... (1)

richdun (672214) | more than 8 years ago | (#15208632)

...for my alma mater.

Cleversafe's headquarters are located at the new University Technology Park [university...gypark.com] at IIT...no, not that IIT, this one [iit.edu] .

Re:Shameless plug... (0)

Anonymous Coward | more than 8 years ago | (#15208658)

And my current school... IIT TechNews misses its old Editor-in-chief, richdun ;-)

Par and Par2? (1)

Anonymous Coward | more than 8 years ago | (#15208682)

Anyone who has used usenet in the last decade or so knows most binaries are split into multiple parts (RAR's now-a-days) with PAR and PAR2 recovery volumes. So instead of making this sound like an awesome new development, why not be honest about what it is: a slightly different application of a very old technology/algorithm.

RAID 5 at the File Level (2, Interesting)

kbahey (102895) | more than 8 years ago | (#15208745)

Slashdotted! Can't check the site contents or the wiki.

From the summary : "the software splits every file you backup into small slices, any majority of which can be used to perfectly recreate all the original data."

So, basically it is like RAID 5 striping and parity [wikipedia.org] applied to the file level.

Neat concept.

Re:RAID 5 at the File Level (1)

LocalFire (698567) | more than 8 years ago | (#15208912)

Is this algorithm of interest to biologists who are working on how information is stored in brains? It seems likely to me that this could be interesting for that type of research.

Cleversafe is alive! (0)

Anonymous Coward | more than 8 years ago | (#15208861)

Looks like it's back up!

Notes from lead Cleversafe designer (5, Informative)

mengland (78217) | more than 8 years ago | (#15209129)

(This is a repost from an earlier part of the thread so that I can get these comments on the toplevel.)

Hello-

I am the lead designer of the first Cleversafe dispersed-storage system (aka a grid-storage software system) and am one of the project's co-founders. The Cleversafe system never stores a complete copy of the data in any one place (or "grid node" in our terminology). At most 1/11th of the file data--we call it a file "slices"--is stored at any one grid node in a "scrambled" (i.e., non-contiguous), compressed, and encrypted/signed fashion. The grid _never_ stores more than one copy of the data on the grid, and that one copy is never stored all in the same place--it's dispersed using an optimized information-dispersal algorithm that we created but has similar properties to the previously-published info-dispersal algorithms (IDAs).

If a grid node and its associated content--i.e., the user's file slices on that node--are ever completely compromised (firewall comes down, all encryption and scrambling is cracked, etc), then the cracker acquires at most 1/11th (one-eleventh) of the data users data.

Further, if any half (or at least 5 out of any 11) of the grid nodes are for any reason destroyed or otherwise unavailable, all of the user's data is still accessible. This is done by generating a "coded" file slice for every data slice that we store on the node, and regenerating missing file slices from down nodes by pumping the available data and coded slices through our info-dispersal algorithms (which are all open-sourced, by the way) that are executed on the client side or when the grid "self heals" for destroyed nodes.

The system can also be implemented in a cost-effective fashion. The grid system can sustain so many concurrent, per-node outages that the availability/uptime requirements for each node are minimal. Also, the grid-node servers need not support much processing capability, for the client offloads much of the work from the servers.

We feel this system provides a powerful combination of reliability, scalability, economy, and security.

The hardest part of the design, imo, is to be able to reliably track all of these file slices across a large and heterogeneous set of grid-node machines housing these info-dispersed file slices. We designed the grid meta-data system from the ground up to do this and to be capacity-expandable, performance-scalable, and easily serviceable. More details for the open-source flavor of the grid-software design can be found here:
http://wiki.cleversafe.org/Grid_Design [cleversafe.org] [cleversafe.org]

There's much more that I can say about this system; I plan to add additional comments to this thread as more questions and comments arise. I'm sure there are new comments I have yet to read, for they're coming in pretty quickly...

I also encourage further discussion at our newly-created web forums: http://forums.cleversafe.org/ [cleversafe.org] [cleversafe.org]
Mailing lists (that will be synchronized with the web forums) will also be available at cleverafe.org in the near future.

-Matt
Cleversafe project lead
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>