Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Error-Proofing Data With Reed-Solomon Codes

kdawson posted more than 6 years ago | from the trust-but-verify dept.

Data Storage 196

ttsiod recommends a blog entry in which he details steps to apply Reed-Solomon codes to harden data against errors in storage media. Quoting: "The way storage quality has been nose-diving in the last years, you'll inevitably end up losing data because of bad sectors. Backing up, using RAID and version control repositories are some of the methods used to cope; here's another that can help prevent data loss in the face of bad sectors: Hardening your files with Reed-Solomon codes. It is a software-only method, and it has saved me from a lot of grief..."

cancel ×

196 comments

Sorry! There are no comments related to the filter you selected.

Been doing this from the past 3 decades (1, Interesting)

xquark (649804) | more than 6 years ago | (#24459605)

slow news day anyone?

Re:Been doing this from the past 3 decades (5, Funny)

Nefarious Wheel (628136) | more than 6 years ago | (#24462493)

Arrrh, aye, this be done since the dawn of time, matey! Ever since the days before global warming when pirates kept a second pistol in their belt just in case. Cap'n Jack Reed in the Solomons would harden his data with a second powder charge when the occasion demanded it.

"Awk! Parroty Error! Parroty Error! Pieces of Seven, Pieces of Seven"

(*BOOM*) never did like that bird.

Typical linux nutcase: HardD's ALREADY do THIS (-1, Flamebait)

Anonymous Coward | more than 6 years ago | (#24463141)

All disks already do this, you dumb fuck. Another linux nutcase copying something that's already been done - FOR DECADES !! You are one stupid mofo, and the stupid ass losers who think "oooooo, great idea!" are only a little less of one.

Re:Been doing this from the past 3 decades (0)

Anonymous Coward | more than 6 years ago | (#24462763)

Damn you're old.

Re:Been doing this from the past 3 decades (1)

Pseudonym (62607) | more than 6 years ago | (#24462907)

Indeed. TFA should get to me when it discovers LDPC.

Re:Been doing this from the past 3 decades (1)

xquark (649804) | more than 6 years ago | (#24463093)

LDPC based codes only work for pure erasure channels and do NOT work for static error channels. How does one perform loopy-belief propagation when the error probability distribution of the medium (in this case the disk) can not be modeled correctly?

Re:Been doing this from the past 3 decades (2, Funny)

Reed Solomon (897367) | more than 6 years ago | (#24462993)

Bah, I never make any erorrs

It can make files a bit hard to read, though (3, Funny)

Bromskloss (750445) | more than 6 years ago | (#24459611)

salkdffalkfhwefh2ihr5j45!"Â5jkcq2%"45wceh5 234j5cja4h5c2q4x524qZTkzzj3kzg3qkgl3kzgq3kjgh kq3gkzlq3hwgjlh 34qlgch34ljkw93q0x45c45 #&%#%&5vcXÂ%YXCHGC%ub64bVE5&UBy4vy5yc5E&Â E%vu64EV46rcuw4&C/4w6

Re:It can make files a bit hard to read, though (4, Insightful)

xquark (649804) | more than 6 years ago | (#24459657)

It really depends where you store the FEC, some techniques store the fec separately others concatenate and others interleave the FEC. Each method has its own advantages and disadvantages.

Re:It can make files a bit hard to read, though (5, Funny)

Tablizer (95088) | more than 6 years ago | (#24463097)

2ihr5j45!"Â5jkcq2%"45wceh5 234j5cja4h5c2q4x524q kq3gkzlq3hwgjlh 34qlgch34ljkw93q0x45c45 #&%#%&5vcXÂ%

I for one welcome our Perl Overlords

Drives already do this (3, Informative)

Rene S. Hollan (1943) | more than 6 years ago | (#24459631)

... at least CDROMs employ RS codes.

Re:Drives already do this (0, Redundant)

VoyagerRadio (669156) | more than 6 years ago | (#24462471)

Do they do this automatically?

Re:Drives already do this (5, Informative)

Architect_sasyr (938685) | more than 6 years ago | (#24462527)

From CD-ROM wiki:

A CD-ROM sector contains 2352 bytes, divided into 98 24-byte frames. The CD-ROM is, in essence, a data disk, which cannot rely on error concealment, and therefore requires a higher reliability of the retrieved data. In order to achieve improved error correction and detection, a CD-ROM has a third layer of Reed-Solomon error correction.[1] A Mode-1 CD-ROM, which has the full three layers of error correction data, contains a net 2048 bytes of the available 2352 per sector. In a Mode-2 CD-ROM, which is mostly used for video files, there are 2336 user-available bytes per sector. The net byte rate of a Mode-1 CD-ROM, based on comparison to CDDA audio standards, is 44.1k/s×4B×2048/2352 = 153.6 kB/s. The playing time is 74 minutes, or 4440 seconds, so that the net capacity of a Mode-1 CD-ROM is 682 MB.

I'd say that's a yes.

Re:Drives already do this (4, Interesting)

Marillion (33728) | more than 6 years ago | (#24462623)

My biggest failed prediction in the world of computers was the CD-ROM.

I was an audio CD early adopter and I knew from articles I read that audio CD's often had a certain defect rate. The defect rate was usually such that you would never hear it. One artist even published all the defects in the liner notes.

Based upon this, I presumed that you would never get the defect rate to zero and that no one would trust a data medium with anything less than perfection - and thus predicted the CD-ROM would never catch on.

They don't have to get the rate to zero. Just close enough to zero for the RS to function.

Re:Drives already do this (5, Informative)

Solandri (704621) | more than 6 years ago | (#24462837)

That's a pretty fundamental part of information theory - communication in a noisy channel [wikipedia.org] . If your communications (or data storage) are digital, you can overcome any level of random noise (error) at the cost of degraded transmission rate (increased storage requirement). Before CDs, it was (and still is) most prevalent in modem protocols and hard drives. Modern hard drives would probably be impossible without it - read errors are the norm, not the exception [storagereview.com] . It's just hidden from the high-level software by multiple levels of error correction in the low-level firmware.

Re:Drives already do this (4, Interesting)

femto (459605) | more than 6 years ago | (#24462939)

Another view is that everything is a code in a noisy environment, so there is no way to talk about "the underlying device" as it itself is just another type of coding. Magnetic recording can be viewed as a way of encoding information onto the underlying (thermal) noisy matter. There is some very deep stuff happening in information theory. Let's take the empty universe as a noisy channel. Now every structure in the universe (including you and me) becomes information encoded over the empty universe. One gets the feeling that any "ultimate theory" won't be expressed in terms of forces and fields but some underlying, unifying, concept of information.

Re:Drives already do this (5, Funny)

norton_I (64015) | more than 6 years ago | (#24463449)

Dude, put the bong down.

Re:Drives already do this (0)

Anonymous Coward | more than 6 years ago | (#24463073)

Thank you for the links!

Re:Drives already do this (0)

Anonymous Coward | more than 6 years ago | (#24462571)

I am no expert on HD reliability, however my thought is this:

Even if drive reliability per sector has gone up, the massive growth in storage capacitiy has perhaps made it more likely that you will have data corruption somewhere. And perhaps even that the corruption extends beyond the redundancy of RS' ability to regenerate the data.

With an additional layer of RS the effective redundancy should increase, making it possible to recover from data corruption that would otherwise result in data loss? With the processing power and storage capacity in modern computers, there should be little noticable overhead, or not?

Re:Drives already do this (2, Informative)

Anonymous Coward | more than 6 years ago | (#24462631)

Yeah, but is there another problem elsewhere in the system? I have an el-cheapo USB-PATA adapter with a MTBF (mean time to bit flip) of about 2 hours. Every other disk was ruined, and I only knew because of a sick little obsession with par2. Disk data is ECC'd, PATA data is parity checked, and USB data is checksummed. Still, inside one little translator chip, all that can be ruined. and that's why data integrity MUST be an operating/file system service.

Yes, RS should be a file system service. (3, Insightful)

Futurepower(R) (558542) | more than 6 years ago | (#24462665)

"... data integrity MUST be an operating/file system service."

I agree. I'm willing to have a small loss in speed and a small increase in price to have better data integrity.

There is already data integrity technology embedded in hard drives, and I support making it more robust.

Re:Drives already do this (3, Interesting)

xquark (649804) | more than 6 years ago | (#24462645)

My understanding is that it is possible to drill a few holes no larger than 2mm in diameter equally spread over the surface of an "audio cd" and with the help of h/w RS erasure decoding, channel interleaving and channel prediction (eg:probabilistically reconstruct missing right channel from known left channel) one can produce a near perfect reconstruction - that's what usually happens to overcome scratches and other kinds of simple surface defects.

Re:Drives already do this (1)

Firehed (942385) | more than 6 years ago | (#24462725)

That might be good for music where a near-lossless reconstruction is acceptable, but I'd suggest not drilling holes in your thesis paper, or any other type of data where lossy compression is unacceptable (everything except images, audio and video, basically)

Re:Drives already do this (2, Funny)

wik (10258) | more than 6 years ago | (#24462755)

You haven't read too many theses. There are holes all over the place...

Re:Drives already do this (5, Informative)

Solandri (704621) | more than 6 years ago | (#24462911)

Data is stored linearly on a CD (and DVD). So the data can survive huge scratches running from the center to edge, but is very susceptible to radial scratches rotated around the center. If you think of a CD as an old-style phonograph record, you can scratch across the grooves and the error correction will fix it; but scratching along a groove will quickly corrupt the data because the scratch will destroy sequential data (and its ECC). That's why they recommend cleaning CDs by wiping from the center out, never in a circular motion.

qpar (0)

Anonymous Coward | more than 6 years ago | (#24459633)

it already exists - QPAR

ZFS? (3, Interesting)

segfaultcoredump (226031) | more than 6 years ago | (#24459653)

Uh, is this not one of the main features of the ZFS file system? It does a checksum on every block written and will reconstruct the data if an error is found? (assuming you are using either raid-z or mirroring. Otherwise it will just tell you that you had an error).

Re:ZFS? (5, Informative)

xquark (649804) | more than 6 years ago | (#24459719)

checksums really only help in detecting errors. Once you've found errors, if you have an exact redundancy somewhere else you can repair the errors. What reed-solomon codes do is provide the error detecting ability but also the error correcting ability whilst at the same time reducing the amount of redundancy required to a near theoretical minimum.

btw checksums have limits on how many errors they can detect within lets say a file or other kind of block of data. A simple rule of thumb (though not exact) is that 16 and 32 bit checksums can detect upto 16,32 bit errors respectively anymore and the chance of not detecting every bit error goes up, it could even result in not finding any errors at all.

Re:ZFS? (0)

Anonymous Coward | more than 6 years ago | (#24462635)

There are two ways of solving this with ZFS:

* RAID-0, mirrored drives.
* Set copies > 1, which tells ZFS to keep at least N copies of all data written to the storage pool.

Case closed! Next!

Re:ZFS? (2, Informative)

bobbozzo (622815) | more than 6 years ago | (#24462745)

Mirroring is RAID-1, not 0.

Re:ZFS? (1)

atrus (73476) | more than 6 years ago | (#24462693)

ZFS does maintain ECC codes to aid in correction. Even on single disks.

As I understand it... (1)

warrax_666 (144623) | more than 6 years ago | (#24462731)

ZFS checksums are actually hashes, as in "cryptographic hash", so they're pretty damn reliable (though theoretically 100% reliable) at detecting errors.

Re:As I understand it... (0)

xquark (649804) | more than 6 years ago | (#24462759)

Ok, lets assume its a 128-bit hash. For a 1GB file how many combinations of 1GB will produce the same hash? the problem here is that at some point the hash/ECC may fail. it always does, its not a matter of if but when.

btw I'm sure ZFS doesn't do the checksums over 1GB blocks perhaps some smaller sized block, regardless the same logic still applies.

Re:As I understand it... (2, Interesting)

Firehed (942385) | more than 6 years ago | (#24462853)

From what I've read and heard, ZFS is designed to pretty much be the last filesystem we'll ever need. I'm pretty sure they've considered hash collisions with regards to data integrity.

Also consider that you probably won't need to reconstruct the entire sector, but only a few bits from it. If there was some sort of insane scenario where you had to reconstruct a complete 1GB block from a single MD5 hash... (ie, "here's an MD5 hash. Give me a sequence of 1073741824 bytes to make it") well it's technically possible, though the electric bill for your server farm may piss off more than a few treehuggers. On the other hand, if you had only a few bytes that needed repair, brute-force reconstruction, while still time-consuming, suddenly becomes more more feasible. I always wonder why I can't apply this kind of logic to torrents with that one file stuck at 99.98%...

I'm sure that kind of thing is largely irrelevant with ZFS as it's designed to be somewhat more efficient, but you get the point.

Re:As I understand it... (0)

Anonymous Coward | more than 6 years ago | (#24462923)

From what I've read and heard, ZFS is designed to pretty much be the last filesystem we'll ever need. I'm pretty sure they've considered hash collisions with regards to data integrity.

In other words "Of course ZFS does that! I have faith!!"

You must have a pretty Zen relationship with your manager if he accepts arguments like that.

Re:As I understand it... (3, Insightful)

Pseudonym (62607) | more than 6 years ago | (#24462891)

Ok, lets assume its a 128-bit hash. For a 1GB file how many combinations of 1GB will produce the same hash?

You're asking the wrong question.

The right question is: Given a 1Gb file, how much "mutation" do you have to do to it to produce a file with the same hash? And the answer to that is: Enough to make the data unrecoverable no matter what you do.

Re:As I understand it... (1)

profplump (309017) | more than 6 years ago | (#24463039)

That's not really true though. While unlikely, it is *possible* that a hash collision occurs on two inputs that vary only by one bit. In most cases we expect a one-bit change in the input to upset about 50% of the bits in the has output, but that's certainly not true for every possible pair of inputs. Checksums are useful for detecting most small errors, but redundant storage and comprehensive bit-by-bit comparison is the only way to be absolutely sure, and that's generally considered too expensive for use in commodity computing.

Re:As I understand it... (1)

xquark (649804) | more than 6 years ago | (#24463063)

That is correct, there is always the possibility that the data can still be viable after a large number of bit flips, and that a problem nonetheless

Re:As I understand it... (1)

xquark (649804) | more than 6 years ago | (#24463081)

heard of the birthday paradox? don't need that much memory to come up with a collision, in fact most MD hash attacks are based on that principle.

Re:As I understand it... (1)

Eighty7 (1130057) | more than 6 years ago | (#24463277)

This case is different because the checksum is known. The birthday problem is about the probability of at least 1 collision given a certain group size. Given a good uniform n-bit hash, it'll take 2^n tries to collide against a given value, and sqrt(2^n) tries to find a random pair of colliding values. According to wiki, ZFS uses 256-bit checksums.

I see... (1)

warrax_666 (144623) | more than 6 years ago | (#24463353)

That'll teach me to leave out a "not". Of course, I meant "theoretically NOT 100% reliable". :)

The odds of collisions against a given fixed hash (which a hash for a data block is) of course depend on the method, but they are miniscule -- probably less than random bits flips on the bus or in RAM. Has anyone even ever found a single example of a SHA256 collision?

Even so, you can *NEVER* be absolutely 100% sure that the data is what you wrote. Even a two-way RAID1 doesn't get you there since you could (theoretically) have identical errors on both drives. Increasing it to a three-way RAID1 with a majority vote (or even just outright declaring an unrecoverable error when a mismatch is found) gets you closer to 100%, but errors are still theoretically possible.

So the point is: You can never attain 100%, but how close to 100% do you want to get? For me, ZFS hashes are "good enough".

Re:ZFS? (3, Informative)

this great guy (922511) | more than 6 years ago | (#24463433)

I have been a ZFS user for a while and know a lot of its internals. Let me comment on what you said.

checksums really only help in detecting errors.

Not in ZFS. When the checksum reveals silent data corruption, ZFS attempts to self-heal itself by rewriting the sector with a known good copy. Self-healing is possible if you are using mirroring, raidz (single parity), raidz2 (dual parity), or even a single disk (provided the copies=2 filesystem attribute is set). The self-healing algorithm in the raidz and raidz2 cases is actually interesting as it is based on combinatorial reconstruction: ZFS makes a series of guesses as to which drive(s) returned bad data, it reconstructs the data block from the other drives, and then validates whether this guess was correct or not by verifying the checksum.

checksums have limits on how many errors they can detect.

All the ZFS checksumming algorithms (fletcher2, fletcher4, SHA-256) generate 256-bit checksums. The default is fletcher2 and offers very good error detection (even errors affecting more than 256 bits of data) assuming unintentional data corruption (the fletcher family are not a cryptographic hash algorithms, it is actually possible to intentionally find collisions). SHA-256 is collision-resistant therefore it will in practice detect all data corruptions. It would be computationally infeasible to come up with a corrupted data block that still matches the SHA-256 checksum.

A good intro to the ZFS capabilities are these slides [opensolaris.org]

Can FUSE help? (1)

ApostleJohn (1236294) | more than 6 years ago | (#24459667)

It would be nice if it was a file system layer like encFS but for error-correction.

Interesting (1)

d_jedi (773213) | more than 6 years ago | (#24459681)

I've been burned by scratched DVD+Rs too many times. I'd be interested if there were a way to do this kind of thing in Windows..

Re:Interesting (1)

enoz (1181117) | more than 6 years ago | (#24462611)

The WinRAR archiver has an optional recovery record which protects against bad blocks.

When you create an archive just specify the amount of protection you require (in practice 3% has served me well).

Re:Interesting (1)

xquark (649804) | more than 6 years ago | (#24462719)

I believe the underlying code used in RAR for the recovery record is in fact an RS(255,249) codes

Re:Interesting (3, Informative)

Anonymous Coward | more than 6 years ago | (#24462711)

The cross platform program dvdisaster [dvdisaster.net] will add extra information to your DVD as an error correcting code. Alternatively, you can make a parity file for an already-existing DVD and save it somewhere else.

It actually has a GUI too, so it must be user friendly.

Re:Interesting (1)

Firehed (942385) | more than 6 years ago | (#24462795)

Bit-torrent?

No seriously. I don't know a whole lot about network infrastructure (nor do I care, strictly speaking), but there's clearly some sort of error-checking/correcting going on behind the scenes as I'll grab huge disk images that pass verification before they get mounted (ex. iPhone SDK ~ 1.2GB) all the time. Some sort of network-based solution is really ideal for data transfer.

Of course with residential upload speeds it's often slower than the ol' sneakernet (depends where it's going, how it's getting there, and how much there is), but not unusably so. I'll SSH half-gig files to/from my home system from work all the time. Grabbing them from the house is somewhat of a painful process that'll often run 1/2-1 1/2hr, but oh well.

Obviously splitting your large disk image into a bunch of smaller rars or whatever you prefer could help with connection disruptions, but data integrity issues are almost never an issue.

Re:Interesting (1)

Hal_Porter (817932) | more than 6 years ago | (#24463129)

I've been burned by scratched DVD+Rs too many times. I'd be interested if there were a way to do this kind of thing in Windows..

Actually Reed Solomon error correction doesn't always help with scratched CDs - they still skip. The problem there is a that the laser can't track a scratched media. If it could track, RS would get the data back though.

Even better (0)

Anonymous Coward | more than 6 years ago | (#24459725)

Just use a modern versioning system, such as Git or Mercurial, which keep track of everything using hashes. Then, you not only get to detect and repair errors, you also get version history.

Re:Even better (1)

ThePromenader (878501) | more than 6 years ago | (#24462915)

A modern versioning system called "Git"? You old...

Drive quality (1)

kipman725 (1248126) | more than 6 years ago | (#24459731)

"The way storage quality has been nose-diving in the last years" I disagree totally all my modern drives are working whereas I used to plagued by hard disk failure. For example are any modern drives as bad as the deathstar? in the past 5 years I have had 0 failures but in the 5 preceding that I had about 8 drive failures. Small sample size I know but to me hard drives are getting better.

Re:Drive quality (1)

dougnaka (631080) | more than 6 years ago | (#24462607)

I just bought a batch of 10 750GB Seagate's from NewEgg and have RMA'd 6 of them, and 1 of the RMA'd drives was DOA and RMA'd. There was almost a silver lining when they shipped us a 1TB replacement, but these are all for RAID 1 mirrors :( Before this I had only had Deathstars, Maxtors and WD's die.

Re:Drive quality (1)

enoz (1181117) | more than 6 years ago | (#24462617)

I'm still waiting for the 5 year warranty to expire on my hard drives.

huh??? (2, Insightful)

Jane Q. Public (1010737) | more than 6 years ago | (#24462661)

If you think storage quality has been nose-diving, then you haven't been around very long. It just isn't so.. and there really is not much more I can say to add to that.

I have been around this industry quite a while, and I call bullshit on that.

Harden Files (2, Funny)

inKubus (199753) | more than 6 years ago | (#24462461)

When he said "harden files", I thought he was going into a long soliloquy on all the porn on his computer, so I went to the next story.

Re:Harden Files (1, Insightful)

Anonymous Coward | more than 6 years ago | (#24462767)

It never ceases to amaze me that the juvenile "heh heh heh.. he said 'harden'" response always gets modded funny. Mods, here's a tip: These kinds of jokes aren't funny unless you are a) 13 years old or b) really drunk.

Re:Harden Files (0)

Anonymous Coward | more than 6 years ago | (#24462951)

It never ceases to amaze me that the juvenile "heh heh heh.. he said 'harden'" response always gets modded funny. Mods, here's a tip: These kinds of jokes aren't funny unless you are a) 13 years old or b) really drunk.

Such jokes are rampant here at slashdot.

Re:Harden Files (0)

Anonymous Coward | more than 6 years ago | (#24463031)

c) both

Re:Harden Files (0)

Anonymous Coward | more than 6 years ago | (#24463053)

When he said "harden files", I thought he was going into a long soliloquy on all the porn on his computer, so I went to the next story.

I notice you came back. :-)

this is amazing (0)

Anonymous Coward | more than 6 years ago | (#24462473)

It's like a code that can correct errors!

Never underestimate the redunancy... (2, Interesting)

symbolset (646467) | more than 6 years ago | (#24462479)

Look, if it's secret, one copy is too many. For everything else, gmail it to five separate recipients. It's not like Google has ever lost any of the millions of emails I've received to date. (This is not a complaint -- they don't show me the spam unless I ask for it).

And if they ever did lose an email, well, to paraphrase an old Doritos commercial, "They'll make more."

Seriously, personally I view the the persistence of data as a problem. It's harder to let go of than it is to keep.

redundancy (1)

symbolset (646467) | more than 6 years ago | (#24462513)

Yeah, I can spell it. Get a libe.

Speed? (3, Interesting)

grasshoppa (657393) | more than 6 years ago | (#24462515)

My question is of speed; this seems a promising addition to anyone's back up routine. However, most folks I know have 100s of gigs of data to back up. While differentials could be involved, right now tar'ing to tape works fast enough taht the backup is done before the first staff shows up for work.

I assume we're beating the hell out of the processor here; so I'm wondering how painful is this in terms of speed?

Re:Speed? (2, Informative)

Zadaz (950521) | more than 6 years ago | (#24462657)

Well since my $100 Radio Shack CD [wikipedia.org] player I bought in 1990 could do it in real-time I'm guessing that the requirements are pretty low. In fact a lot of hardware already uses it.

If you read the rest of the page you find out it's very ingenious and efficient at doing what it does.

While it's certainly not new (it's from 1960) or unused (hell, my phone uses it to read QR codes) I'm sure its something that has been under the radar of a lot of Slashdot readers, so I'll avoid making a "slow news day" snark.

Re:Speed? (0)

Anonymous Coward | more than 6 years ago | (#24463421)

Your CD player took about an hour for 600 MB of data. While that is real-time for audio, it might be a tad slow for saving your mixtape to disk.

Re:Speed? (4, Interesting)

xquark (649804) | more than 6 years ago | (#24462691)

The speed of encoding and decoding directly relates to the type of RS and the amount of FEC required. Generally speaking erasure style RS can go as low as O(nlogn) (essentially inverting and solving for a vandermonde or Cauchy style matrix) A more general code that can correct errors (the difference between an error and an erasure is that in the latter you know the location of the error but not its magnitude) may require a more complex process, something like Syndrome-Berlekamp Massey-Forney which is about O(n^2).

It is possible to buy specialised h/w (or even GPUs) to perform the encoding steps (getting roughly 100+MB/s) and most software encoders can do about 50-60+Mb/ for RS(255,223) - YMMV

Version control != backups (2, Insightful)

XorNand (517466) | more than 6 years ago | (#24462541)

Please, please stop thinking of version control as some sort of backup. When we initially started mandating the use of version control software, developers would just using the "commit" button instead of the "save" button. It makes it *much* more difficult to traverse through the repo when you have three dozen commits per day, per developer, each commented with "ok. really should be fixed now." The worst offenders were issued an Etchasketch for a week while their notebooks went in for service *cough*. Problem solved.

Re:Version control != backups (2, Insightful)

dgatwood (11270) | more than 6 years ago | (#24462715)

Well, you shouldn't commit until you believe you have it in a state where the changes are usable (i.e. don't break the tree), but beyond that, I'd rather see more commits of smaller amounts of code than giant commits that change ten thousand things. If you end up having to back out a change, it's much easier if you can easily isolate that change to a single commit. My rule is commit early, commit often. I'm not the only one, either:

http://blog.red-bean.com/sussman/?p=96 [red-bean.com]

Re:Version control != backups (3, Insightful)

ceswiedler (165311) | more than 6 years ago | (#24462771)

The best solution is for developers to use their own private branches. Then they can commit as much as they want, and integrate into the main branch when they're ready. Unfortunately subversion has crappy support for integration (even with version 1.5 AFAICT) compared to something like perforce.

Re:Version control != backups (0)

Anonymous Coward | more than 6 years ago | (#24462779)

The worst offenders were issued an Etchasketch for a week while their notebooks went in for service.

And the president responded by setting the "British Humour" alert status to red.

You are all dumb as there is only one way. (0)

Anonymous Coward | more than 6 years ago | (#24462595)

Whenever you back up to CD or DVD, fill up any unused remaining space with par files generated from the data being backed up.

Reed-Solomon is ancient compared to par2.

Re:You are all dumb as there is only one way. (1)

xquark (649804) | more than 6 years ago | (#24462741)

Fine we are all dumb, now tell me what happens when the errors on the disk are spread over all the files and the par2 equally to the extent at which the drive wont even send data back - what does one do then? how do you tell the drive about the par fec?

its best to spread data equally over multiple devices as if one device was to totally fail you could still get something back (reconstruct) from the other devices.

Re:You are all dumb as there is only one way. (1)

Stalin (13415) | more than 6 years ago | (#24462801)

I hope you are joking. PAR2 is nothing more than an implementation of RS codes. So how can RS codes be ancient compared to PAR2?

From the PAR2 specification introduction [sourceforge.net] :

"The redundant data in the PAR files is computed using Reed-Solomon codes. These codes can take a set of equal-sized blocks of data and produce a number of same-sized recovery blocks. Then, given a subset of original data blocks and some recovery block, it is possible to reproduce the original data blocks. Reed-Solomon codes can do this recovery as long as the number of missing data blocks does not out number the recovery blocks. The design of the Reed-Solomon codes in this spec is based on James S. Plank's tech report at U. of Tennessee entitled A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems. The tech report contains an error, so the design is changed slightly to fix the problem. PAR 2.0 uses a 16-bit Reed-Solomon code and can support 32768 blocks."

I don't want to spend time making Par2 files. (1)

Futurepower(R) (558542) | more than 6 years ago | (#24462861)

Excellent, I agree. But generation of Par2 files should be automatic. I don't mind having only 3.5 gigabytes on a DVD for data if the Par2 files are generated and tested automatically.

Par2 is apparently Reed-Solomon [wikipedia.org] done in a more helpful way.

Quote from the Parity Volume Set Specification 2.0 [sourceforge.net] : "PAR 2.0 uses a 16-bit Reed-Solomon code and can support 32768 blocks."

This is not very useful for changing data... (1)

Jane Q. Public (1010737) | more than 6 years ago | (#24462695)

since it is only a "snapshot" of the data at a particular time. Any time you change the data, you have to do another "snapshot". What a major pain in the ass.

This might be useful for archived files, but not something you change on a regular basis.

I wish PAR2 would have kept improving... (1)

WoTG (610710) | more than 6 years ago | (#24462717)

Yes, CDs and DVDs have error correction built in, but they don't do much if you happen to a nice scratch that follows the spin of the disk. I.e. a moderate scratch from the outside to the inside of a CD is reasonably OK for data, but a scratch the other way will kill your data much more easily.

For a while I was using PAR2, yes, the PAR2 used on USENET, to beef up the safety of my DVD backups of my home data. Unfortunately, PAR2 never really evolved to handle subdirectories properly, which mattered when I wanted an off-site backup of my digital photos.

Eventually, I started using ICE ECC, http://www.ice-graphics.com/ICEECC/IndexE.html [ice-graphics.com] , free as in beer, to enhance my DVD backups of stuff like photos and data. IIRC, I tested it's ability to reconstruct missing files and it seemed OK at the time.

Anyways, that's my $0.02 on Reed-Solomon for backups.

Or you could just use PAR... (2, Interesting)

InakaBoyJoe (687694) | more than 6 years ago | (#24462721)

TFA introduces some new ".shielded" file format. But do we need yet another file format when PAR (Parchive) [wikipedia.org] has been doing the same job for years now? The PAR2 format is standardized and well-supported cross-platform, and might just have a future even IF you believe that Usenet is dying [slashdot.org] ...

I always thought it would be cool to have a script that:

  • Runs at night and creates PAR2 files for the data on your HD.
  • Occasionally verifies file integrity against the PAR2 files.

With a system like this, you wouldn't have to worry about throwing away old backups for fear that some random bit error might have crept into your newer backups. Also, if you back up the PAR2 files together with your data, as your backup media gradually degrades with time, you could rescue the data and move it to new media before it was too late.

Of course, at the filesystem level there is always error correction, but having experienced the occasional bit error, I'd like the extra security that having a PAR2 file around would provide. Also, filesystem-level error correction tends to happen silently and not give you any warning until it fails and your data is gone. So a user-level, user-adjustable redundancy feature that's portable across filesystems and uses a standard file format like PAR would be really useful.

This is not the same thing as PAR ... (4, Informative)

DrJimbo (594231) | more than 6 years ago | (#24462959)

... even though both TFA and PAR use Reed-Solomon.

The difference is that TFA interleaves the data so it is robust against sector errors. A bad sector contains bytes from many different data blocks so each data block only loses one byte which is easy to recover from. If you use PAR and encounter a bad sector, you're SOL.

PAR was designed to solve a different problem and it solves that different problem very well but it wasn't designed to solve the problem that is addressed by TFA. Use PAR to protect against "the occasional bit error" as you suggest, but use the scheme given in TFA to protect against bad sectors.

PAR2, anyone? (1)

mystik (38627) | more than 6 years ago | (#24462723)

Doesn't par2 already employ reed-solomon? (http://en.wikipedia.org/wiki/Parchive [wikipedia.org] )

And it has all sorts of options let you configure the amount of redundancy you'd like?

And it has (ahem) been very well tested in the recovery of incomplete binary archives ... ?

Now that usenet has been stripped of binaries, we'll have to find other uses for these tools ....

what about quickpar and dvdisaster? (4, Informative)

MoFoQ (584566) | more than 6 years ago | (#24462733)

quickpar [quickpar.org.uk] especially has been in use on usenet/newsgroups for years....o yea...forgot....they are trying to kill it.

anyways...there's also dvdisaster [dvdisaster.net] which now has several ways of "hardening".
one of them seems to catch my attention: adds error correction data to a CD/DVD (via a disc image/iso)

Why IS storage quality going down? (1)

jerryasher (151512) | more than 6 years ago | (#24462797)

I'm glad it's not just me thinking my drives are dying sooner than they once did.

Why is storage quality going down, and what does that mean for that 1TB drive for $200 bucks? Will it's lifespan exceed two years?

Re:Why IS storage quality going down? (1)

rrohbeck (944847) | more than 6 years ago | (#24462947)

Because everybody uses desktop quality SATA drives in enterprise RAIDs. And every vendor pushes density in desktop drives as hard as possible even though it's been getting more and more difficult.
The market for high end "enterprise" drives is almost dead. When was the last time you saw a SCSI (FC,SAS) drive?
There's nothing wrong with the basic approach but you have to do the math and use the correct AFR and TTR numbers. We just went from RAID 5 to RAID 6 because the observed drive failure rates were higher (by a factor of 3) than what the vendors promised and the system failure rates were just too high.
For example, in one drive, it turned out that SATA error recovery (timeout control) didn't work as advertised, so the "enterprise ready" SATA drivers weren't all that enterprise-ish. And, OBTW, FW upgrades in the field turned out to be totally unreliable and turned drives into bricks.
FW upgrade was only tested on Intel ICH SATA ports and the RAID controller just didn't have the right timing :(

Re:Why IS storage quality going down? (1)

lukas84 (912874) | more than 6 years ago | (#24463371)

The market for high end "enterprise" drives is almost dead. When was the last time you saw a SCSI (FC,SAS) drive?

What? Maybe on your planet. 2.5" SAS Drives in Servers are common - even in middle class servers.

What kind of servers are you buying that are using SATA instead of SAS?

Wasn't PAR designed for this problem? (0)

Anonymous Coward | more than 6 years ago | (#24462847)

http://parchive.sourceforge.net/

hardware (0)

Anonymous Coward | more than 6 years ago | (#24462859)

Who is storage critical files without error correction and a checksummed file system?

Or in other words, ECC + ZFS.

You can't complain about Data Loss, if you're running cheap desktop hardware on NTFS.

uhhm, raid6? (1)

RelliK (4466) | more than 6 years ago | (#24462883)

RAID6 [wikipedia.org] uses Reed-Solomon error correction. In fact, RAID5 can be viewed as a special case of RAID6.

This thing looks like a solution in search of a problem. Slow news day?

Bose Chaudhuri (1, Informative)

ishmalius (153450) | more than 6 years ago | (#24462969)

These codes, http://en.wikipedia.org/wiki/BCH_code [wikipedia.org] , are far superior.. However, both Miller code and these pale in comparison to Low Density Parity Check codes. http://en.wikipedia.org/wiki/Low-density_parity-check_code [wikipedia.org]

Re:Bose Chaudhuri (1)

xquark (649804) | more than 6 years ago | (#24463109)

RS codes are a subset of BCH codes, if anything is far superior it would be RS codes as they are more generalized and have better specific decoding techniques and can be both systematic and linear.

Again as mentioned before LDPCs are not useful in these situations they are only useful in overcoming erasures within data communications.

Re:Bose Chaudhuri (1)

locofungus (179280) | more than 6 years ago | (#24463181)

Again as mentioned before LDPCs are not useful in these situations they are only useful in overcoming erasures within data communications.

Why aren't erasure codes any good here? What errors are we trying to recover from?

Don't we have known bad sectors from the disk?

I don't know if, when you get a bad sector the drive returns nothing or whether the drive can be told to return its best guess so I can see it might depend whether your code word is a sector or something smaller.

Tim.

Re:Bose Chaudhuri (0)

Anonymous Coward | more than 6 years ago | (#24463233)

It is also difficult to build real time LDPC codecs that approach capacity. It is practical to do so only for specialised classes of LDPC codes. While LDPC parity check matrices are sparse, in general the generator matrices are large and not sparse.

ATTN MODS (0, Offtopic)

Hobart (32767) | more than 6 years ago | (#24463357)

Mod parent, grandparent, and great grandparent way up please - this is the most significant thread in the comments so far.

Re:Bose Chaudhuri (0)

Anonymous Coward | more than 6 years ago | (#24463461)

These codes, http://en.wikipedia.org/wiki/BCH_code [wikipedia.org] , are far superior.

Given your penchant for linking to wikipedia, why not mention http://en.wikipedia.org/wiki/Reed_Solomon#Reed-Solomon_codes_as_BCH_codes [wikipedia.org] ?

Reed-Solomon codes are, in fact, a subset of the BCH codes. To call the BCH codes strictly superior is a little bit misleading.

Is this step really necessary? (1)

Kjella (173770) | more than 6 years ago | (#24463105)

Yes, this has been done forever etc. but has anyone experienced any ugly "bit rot"? I mean, I've had firewalls that would checksum applications and if it ever complained about surprise changes I didn't catch it. Equally I have about 100GB for which I have CSVs - no spontanious corruption to note. Source code should very easily fail to compile if a random bit was flipped, also can't think of any case. I guess if it's that important having a PAR file with some recovery data won't hurt but first you I'd take RAID + backups any day.

PAR2 (1)

rm-ce (1274516) | more than 6 years ago | (#24463117)

There is already the pretty mature and fairly widely used par2 application that already does this.

It's handy for downloading binaries off usenet, where you might lose a few parts.
(Say, only be able to download 18 of the 20 files)

I've also used it to success when burning files to cdr, that have later become corrupted beyond what the cd error correction could handle.

Much recommended!

http://en.wikipedia.org/wiki/Parchive [wikipedia.org]

Thank you. (0)

Anonymous Coward | more than 6 years ago | (#24463137)

This site has really great info. I love finding out about all these codes. Thanks

Jay
Cyber Monday [bestcybermondaysales.com]

Information Theory 101 (1)

profplump (309017) | more than 6 years ago | (#24463149)

Channel noise can be overcome via increased redundancy in transmission/storage, thereby reducing the effective transfer rate/storage density. Film at 11.

I could be wrong, but I'm pretty sure this is why we have on-disk (and on-bus) checksums and ECC RAM. And frankly if your mission-critical data is being ruined by DVD scratches, adding RS codes to your DVDs is probably not going to solve the fundamental problem of system administrator incompetence.

/ Seriously, these days Fark has more technically competent and interesting articles than /.

fro5T pist (-1, Troll)

Anonymous Coward | more than 6 years ago | (#24463243)

about halcf 0f the

Datarecovery "data". (4, Insightful)

rew (6140) | more than 6 years ago | (#24463305)

Working for a datarecovery company, I know that about half the cases where data is lost the whole drive "disappears". So, bad sectors? You can solve that problem with reed solomon! Fine! But that doesn't replace the need for backups to help you recover from: accidental removal, fire, theft and total disk failure (and probably a few other things I can't come up with right now)... .

RS works like a dream for mailing barcodes (1)

Centurix (249778) | more than 6 years ago | (#24463351)

Australia Post implemented the Royal Mail's 4 state barcoding system for all bulk and pre-sorted mail categories. The barcode incorporates RS and greatly improves the scan rate of damaged mail. The RM4SCC was adopted throughout the US and Canada.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>