×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

One Way To Save Digital Archives From File Corruption

timothy posted more than 4 years ago | from the don'tcha-love-finding-corrupted-files dept.

Data Storage 257

storagedude points out this article about one of the perils of digital storage, the author of which "says massive digital archives are threatened by simple bit errors that can render whole files useless. The article notes that analog pictures and film can degrade and still be usable; why can't the same be true of digital files? The solution proposed by the author: two headers and error correction code (ECC) in every file."

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

257 comments

To much reinvention (5, Interesting)

DarkOx (621550) | more than 4 years ago | (#30322692)

If this type of thing is implemented at the file level every application is going to have to do its own thing. That means to many implementations most of which wont be very good or well tested. It also means applications developers will have to be busy slogging though error correction data in their files rather than the data they actually wanted to persist for their application. I think the article offers a number of good ideas but it would be better to do most of them at the filesystem and perhaps some at the storage layer.
    Also if we can present the same logical file when read to the application even if every 9th byte is parity on the disk that is a plus because it means legacy apps can get the enhanced protection as well.

Re:To much reinvention (-1, Redundant)

imamac (1083405) | more than 4 years ago | (#30322712)

If this type of thing is implemented at the file level every application is going to have to do its own thing.

Great. So add it to the system level.

Re:To much reinvention (1)

commodore64_love (1445365) | more than 4 years ago | (#30322768)

>>>>> If this type of thing is implemented at the file level every application is going to have to do its own thing.
>>
>>Great. So add it to the system level.

Somebody has ADD and didn't bother to finish reading the *whole* paragraph. Quote: "It would be better to do most of them at the filesystem..."

Re:To much reinvention (5, Insightful)

paradxum (67051) | more than 4 years ago | (#30322838)

It already exists, it's called ZFS on solaris boxxen. Each block uses ECC, it can correct itself on each read, and generally can indicate a failing disk. This truly is the filesystem every other one is playing catchup with.

Re:To much reinvention (1, Informative)

Anonymous Coward | more than 4 years ago | (#30322950)

Or, for those that prefer a BSD to a SysV unix, it also works fine on FreeBSD.

Re:To much reinvention (0)

Anonymous Coward | more than 4 years ago | (#30323004)

Have you tried RCFS? The Egyptians did. Just need to replace the drive platter and head with a rock and chisel and we good for another 5k.

Re:To much reinvention (3, Funny)

Whalou (721698) | more than 4 years ago | (#30323036)

ReiserFS is good for that also. If you make a deal with the 'file system' it will tell you where your 'file' is hidden.

Re:To much reinvention (1)

Linker3000 (626634) | more than 4 years ago | (#30323064)

You have just taken the hideous word 'boxen' to new heights!

True, though, ZFS is a bleedin obvious implementation of this kind of thing

Re:To much reinvention (1, Funny)

Anonymous Coward | more than 4 years ago | (#30322740)

At least they didn't suggest just sticking the files "in the cloud".

Re:To much reinvention (1)

Rockoon (1252108) | more than 4 years ago | (#30322924)

What we are talking about here is how to add more redundancy on the software level.. but honestly..

...why not do it at the hardware level where there is already redundancy, and cant be fucked up by an additional error vector?

Re:To much reinvention (5, Insightful)

MrNaz (730548) | more than 4 years ago | (#30323018)

Ahem. RAID anyone? ZFS? Btrfs? Hello?

Isn't this what filesystem devs have been concentrating on for about 5 years now?

Re:To much reinvention (2, Insightful)

Interoperable (1651953) | more than 4 years ago | (#30323076)

I agree that filesystem level error correction is good idea. Having the option to specify ECC options for a given file or folder would be great functionality to have. The idea presented in this article, however, is that certain compressed formats don't need ECC for the entire file. Instead, as long as the headers are intact, a few bits here or there will result in only some distortion; not a big deal if it's just vacation photos/movies.

By only having ECC in the headers, you would save a good deal of storage space and processing time. It wouldn't need to be supported in every application either, just the codecs. Individual codecs could include it fairly easily as they release new versions, which wouldn't be backward compatible anyway so you don't introduce a new problem. I think it's a good idea, it would keep media readable with very little overhead, just a few odd pixels during playback even in a corrupted file.

Re:To much reinvention (3, Insightful)

bertok (226922) | more than 4 years ago | (#30323188)

If this type of thing is implemented at the file level every application is going to have to do its own thing. That means to many implementations most of which wont be very good or well tested. It also means applications developers will have to be busy slogging though error correction data in their files rather than the data they actually wanted to persist for their application. I think the article offers a number of good ideas but it would be better to do most of them at the filesystem and perhaps some at the storage layer.

    Also if we can present the same logical file when read to the application even if every 9th byte is parity on the disk that is a plus because it means legacy apps can get the enhanced protection as well.

Precisely. This is what things like torrents, RAR files with recovery blocks, and filesystems like ZFS are for: so every app developer doesn't have to roll their own, badly.

par files (5, Informative)

ionix5891 (1228718) | more than 4 years ago | (#30322704)

include par2 files

par2 files (0)

Anonymous Coward | more than 4 years ago | (#30323330)

For any media you actually expect to retain for years, possibly without touching, create par2 files - say 10% to correct any errors later.


#!/bin/sh

for filename in "$@"; do

      # Create a 10% recovery data with blocksize of 300KB
      nice par2 create -s307200 -r10 "$filename"

done

Or just use PAR for your archives (1)

syntap (242090) | more than 4 years ago | (#30322716)

Done. +1 to the poster who said there is some round transportation implement being reinvented here.

It's that computer called the brain. (5, Interesting)

commodore64_love (1445365) | more than 4 years ago | (#30322724)

>>>"...analog pictures and film can degrade and still be usable; why can't the same be true of digital files?"

The ear-eye-brain connection has ~500 million years of development, and has learned the ability to filter-out noise. If for example I'm listening to a radio, the hiss is mentally filtered-out, or if I'm watching a VHS tape that has wrinkles, my brain can focus on the undamaged areas. In contrast when a computer encounters noise or errors, it panics and says, "I give up," and the digital radio or digital television goes blank.

What we need is a smarter computer that says, "I don't know what this is supposed to be, but here's my best guess," and displays noise. Let the brain then takeover and mentally remove the noise from the audio or image.

Re:It's that computer called the brain. (4, Interesting)

commodore64_love (1445365) | more than 4 years ago | (#30322794)

P.S.

When I was looking for a digital-to-analog converter for my TV, I returned all the ones that displayed blank screens when the signal became weak. The one I eventually chose (x5) was the Channel Master unit. When the signal is weak it continues displaying a noisy image, rather than go blank, or it reverts to "audio only" mode, rather than go silent. It lets me continue watching programs rather than be completely cutoff.

Re:It's that computer called the brain. (0)

Anonymous Coward | more than 4 years ago | (#30323186)

That's done when it's possible to do so. Good luck trying it on something like a zip file.

Don't confuse background noise with a gap in the signal. Your speakers don't hiss when a CD is scratched, they pop, which is why in some cases it's better to just wait until the stream looks good again.

Re:It's that computer called the brain. (2, Insightful)

ILongForDarkness (1134931) | more than 4 years ago | (#30323276)

And how well did that work for your last corrupted text file? Or a printer job that the printer didn't know how to handle? My guess you could pick out a few words and the rest was random garble. The mind is good at filtering out noise but it is an intrinsically hard problem to do a similar thing with a computer. Say a random bit is missed and the whole file ends up shifted one to the left, how does the computer know that the combinations of pixel values it is displaying should start one bit out of sync so that the still existing data "looks" good? Similarly with a text file, all the remaining bits could be valid characters, how is a computer to know what characters to show other than having the correct data?

Re:It's that computer called the brain. (1)

nedlohs (1335013) | more than 4 years ago | (#30323326)

Sure, but you'd need to not compress anything and introduce redundancy. Most people prefer using efficient formats and doing error checks and corrections elsewhere (in the filesystem, by external correction codes - par2 springs to mind, etc).

Here's a simple image format that will survive bit flip style corruption:

Every pixel is stored as with 96 bits, the first 32 is the width of the image, the next 32 is the height, the next 8 bits is the R, the next 8 bits is the G, the next 8 bits is the B, and the last 8 bits is the alpha channel.

To read the image every 96 bit chunk is read and the most common 64 bit prefix is used as the width/height. And the 24 bits of color data for each pixel is used as is.

That will survive large amounts bit corruption just fine - it won't survive shifting (adding 32 bits of extra data at the start of the file, for example). It would also be ridiculously large compared with the image file formats we do actually use.

Re:It's that computer called the brain. (4, Interesting)

Phreakiture (547094) | more than 4 years ago | (#30323388)

What we need is a smarter computer that says, "I don't know what this is supposed to be, but here's my best guess," and displays noise. Let the brain then takeover and mentally remove the noise from the audio or image.

Audio CDs have always done this. Audio CDs are also uncompressed*.

The problem, I suspect, is that we have come to rely on a lot of data compression, particularly where video is concerned. I'm not saying this is the wrong choice, necessarily, because video can become ungodly huge without it (NTSC SD video -- 720 x 480 x 29.97 -- in the 4:2:2 colour space, 8 bits per pixel per plane, will consume 69.5 GiB an hour without compression), but maybe we didn't give enough thought to stream corruption.

Mini DV video tape, when run in SD, uses no compression on the audio, and the video is only lightly compressed, using a DCT-based codec, with no delta coding. In practical terms, what this means is that one corrupted frame of video doesn't cascade into future frames. If my camcorder gets a wrinkle in the tape, it will affect the frames recorded on the wrinkle, and no others. It also makes a best-guess effort to reconstruct the frame. This task may not be impossible with more dense codecs that do use delta coding and motion compensation (MPEG, DiVX, etc), but it is certainly made far more difficult.

Incidentally, even digital cinemas are using compression. It is a no-delta compression, but the individual frames are compressed in a manner akin to JPEGs, and the audio is compressed either using DTS or AC3 or one of their variants in most cinemas. The difference, of course, is that the cinemas must provide a good presentation. If they fail to do so, people will stop coming. If the presentation isn't better than watching TV/DVD/BluRay at home, then why pay the $11?

(* I refer here to data compression, not dynamic range compression. Dynamic range compression is applied way too much in most audio media)

Sun Microsystems..... zfs..... (3, Insightful)

HKcastaway (985110) | more than 4 years ago | (#30322728)

ZFS.

Next topic....

Re:Sun Microsystems..... zfs..... (0)

Anonymous Coward | more than 4 years ago | (#30322782)

Right, and there's so many Linux distros with ZFS in them. (No, not counting FUSE.)

Oh wait, you can use FreeBSD instead.

FreeBSD FTW.

Re:Sun Microsystems..... zfs..... (0)

Anonymous Coward | more than 4 years ago | (#30322908)

Right, and there's so many Linux distros with ZFS in them. (No, not counting FUSE.)

Oh wait, you can use FreeBSD instead.

FreeBSD FTW.

The Linux powers-that-be are the ones that chose to distribute their product under a license that can't be mixed with others.

Re:Sun Microsystems..... zfs..... (0)

Anonymous Coward | more than 4 years ago | (#30323012)

Right, and there are so many Linux distros with ZFS in them. (No, not counting FUSE.)

Oh wait, you can use FreeBSD instead.

FreeBSD FTW.

The Linux powers-that-be are the ones that chose to distribute their product under a license that can't be mixed with others.

Indeed. I was being sardonic. I'm a FreeBSD user.

Re:Sun Microsystems..... zfs..... (0)

Anonymous Coward | more than 4 years ago | (#30323080)

Okay, that explains it. There aren't any sardonic Linux users.

Re:Sun Microsystems..... zfs..... (1)

jedidiah (1196) | more than 4 years ago | (#30323336)

> The Linux powers-that-be are the ones that chose to distribute their product under a license that can't be mixed with others.

That was like... 25 years ago.

That excuse doesn't really work well for anything released recently.

No. It's the authors of newer works that choose not to "play nice".

Re:Sun Microsystems..... zfs..... (1)

sskinnider (1069312) | more than 4 years ago | (#30323104)

Hard drives are not yet an ubiquitous media for archiving. Once the files are written to CD or tape you lose the advantage that hardware or filesystem protection gave you and now you are dependent on the lifespan and degradation of the media.

Re:Sun Microsystems..... zfs..... (1)

tepples (727027) | more than 4 years ago | (#30323222)

Once the files are written to CD or tape you lose the advantage that hardware or filesystem protection gave you

PAR2. Better?

Re:Sun Microsystems..... zfs..... (1)

xZgf6xHx2uhoAj9D (1160707) | more than 4 years ago | (#30323526)

Why? There's no reason a filesystem like ZFS can't be used on CD or tape and a lot of people do use them.

Even if you didn't want to do that, ISO 9660, the filesystem used by default on data CDs, contains its own error correction scheme (288 bytes of redundancy for every 2048 byte block).

PNG was designed to be able to do this (1, Interesting)

Anonymous Coward | more than 4 years ago | (#30322734)

The PNG image format divides the image data into "chunks", typically 8kbytes each, and each having a CRC checksum. You'd archive two copies of each image, presumably in two places and on different media. Years later you check both files for CRC errors. If there are just a few errors, probably they won't occur in the same chunk, so you can splice the good chunks from each stored file to create a new good file.

Re:PNG was designed to be able to do this (0)

Anonymous Coward | more than 4 years ago | (#30322818)

Someone should make an app for that.

Re:PNG was designed to be able to do this (1)

the_hellspawn (908071) | more than 4 years ago | (#30323426)

An app, someone has an i-garbage in their butt. It would be more like is there a script for that.

What files does a single bit error destroy? (2, Insightful)

jmitchel!jmitchel.co (254506) | more than 4 years ago | (#30322742)

What files does a single bit error irretrievably destroy? Obviously it may cause problems, even very annoying problems when you go to use the file. But unless that one bit is in a really bad spot that information is pretty recoverable.

Re:What files does a single bit error destroy? (3, Informative)

Jane Q. Public (1010737) | more than 4 years ago | (#30322796)

That's complete nonsense. Just for one example, if the bit is part of a numeric value, depending on where that bit is, it could make the number off anywhere from 1 to 2 BILLION or even a lot more, depending on the kind of representation being used.

Re:What files does a single bit error destroy? (1)

maxume (22995) | more than 4 years ago | (#30322868)

Perhaps that is what the poster meant by "bad spot". If "Hitler" were altered to read as "Hatler", I'm pretty sure the meaning would still be clear from the context.

Re:What files does a single bit error destroy? (2, Funny)

gzipped_tar (1151931) | more than 4 years ago | (#30322920)

Perhaps that is what the poster meant by "bad spot". If "Hitler" were altered to read as "Hatler", I'm pretty sure the meaning would still be clear from the context.

Godvin.

Re:What files does a single bit error destroy? (1)

91degrees (207121) | more than 4 years ago | (#30322952)

People like you are worse than Hatler!

Re:What files does a single bit error destroy? (1)

Raffaello (230287) | more than 4 years ago | (#30322996)

People like you are worse than Hatter!

Re:What files does a single bit error destroy? (1)

Muad'Dave (255648) | more than 4 years ago | (#30322978)

I actually checked to see if 'v' and 'w' were different by only a single bit. * facepalm *

Re:What files does a single bit error destroy? (0)

Anonymous Coward | more than 4 years ago | (#30323048)

Perhaps that is what the poster meant by "bad spot". If "Hitler" were altered to read as "Hatler", I'm pretty sure the meaning would still be clear from the context.

Hrm.. You may have just explained the Nostradamus prediction of "Hister" instead of "Hitler". A single bit corruption in the "From the Future Media Stream" he was watching.

Re:What files does a single bit error destroy? (0)

Anonymous Coward | more than 4 years ago | (#30323122)

well, we should start storing numeric values as images and use ocr to process it. that way if a bit is wrong, the error would not have any consequence, like when you watch tv a dead pixel does not keep you from understanding what is going on.

I think I should patent this idea, it's obviously the future.

Re:What files does a single bit error destroy? (0)

Anonymous Coward | more than 4 years ago | (#30323278)

damb, i had a perfect idea and started writing a whole paragraph, then I went to find a quick page and the bloody browser didn't start a new tab, so I lost the text and can't be bothered to enter it all again.

The last bit was just - COMPUTERS CAN MAKE PERFECT COPIES AND DATA STORAGE IS RELATIVELY CHEAP.

That really depends on the file (1)

FrankDerKte (1472221) | more than 4 years ago | (#30323542)

It depends on exactly what is encoded with this number. If it is a pixel in an image flipping one bit won't destroy the image. The same is true for video and plain text. Flipping a bit in a text file would change exactly on letter.

The problem lies in the encoding and the type of information saved.

If, for example, a binary format is used, there has to be a way to identifiy the borders of the different data formats. For information of fixed length, this can be done with counting. If the information has no fixed length, this can be done with byte stuffing.

If we want to save a linked list in a binary format using byte stuffing, we would take one byte and define it to be the new list character. Then we code the list in a way that our list character newer turns up in the expression used to save a list. Normally this is done by defining an escape character which tells us that the next character is not the special character used for byte stuffing.

For example, we want to save a list of sentences using ASCII and define the pipe symbol "|" as special character. The resulting file would look like this:

Hello world|This is a simple example|for the almighty interweb

Now, the only way the file would turn out to be unusable after flipping a bit is, if a bit of the byte used for the pipe symbol was flipped. Flipping any other bit would change the meaning, but the program would be able to load it anyway.

So it depends on what and how it is saved. For most media files flipping a bit would not render the file useless. For an account it probably would.

If the data is important, add redundancy, that is additional bits for error correction. If the data is not important, don't add redundancy because it increases the file size.

That is exactly the reason why compressed files are useless after a flipped bit. A compression algorithm removes redundant bits to decrease file size.

Re:What files does a single bit error destroy? (5, Insightful)

Rockoon (1252108) | more than 4 years ago | (#30322872)

Most modern compression formats will not tolerate any errors. With LZ a single bit error could propagate over a long expanse of the uncompressed output, while with Arithmetic encoding the remainder of the file following the single bit error will be completely unrecoverable.

Pretty much only the prefix-code style compression schemes (Huffman for one) will isolate errors to short sgements, and then only if the compressor is not of the adaptive variety.

Re:What files does a single bit error destroy? (1)

je ne sais quoi (987177) | more than 4 years ago | (#30323146)

Doesn't rar have a capacity to have some redundancy though? I seem to recall that downloading multi-part rar files from usenet a while ago that included some extra files that could be used to rebuild the originals in the case of file corruption (which happened fairly often with usenet).

Re:What files does a single bit error destroy? (1)

netsharc (195805) | more than 4 years ago | (#30322968)

I'd venture to say TrueCrypt containers, when that corruption occurs at the place where they store the encrypted symmetrical key. Depending on the size of said container it could be the whole harddisk. :)

Re:What files does a single bit error destroy? (1)

EllisDees (268037) | more than 4 years ago | (#30323362)

I've got a 10 gig .tar.bz2 file that I've only been partially able to recover due to a couple of bad blocks on a hard drive. I ran bzip2recover on it, which broke it into many, many pieces, and then put them back together into a partially recoverable tar file. Now I just can't figure out how to get past the corrupt pieces.:(

clueless (0)

Anonymous Coward | more than 4 years ago | (#30322756)

Stupid idea. Nowadays digital preservation is more about file format conversion then about bit rot.

What about the "block errors"? (4, Informative)

MathFox (686808) | more than 4 years ago | (#30322792)

Most of the storage media in common use (disks, tapes, CD/DVD-R) already do use ECC at sector of block level and will fix "single bit" errors at firmware level transparently. What is more of an issue at application level are "missing block" errors; when the low-level ECC fails and the storage device signals "unreadable sector" and one or more blocks of data are lost.

Off course this can be fixed by "block redundancy" (like RAID does), "block recovery checksums" or old-fashioned backups.

Re:What about the "block errors"? (1, Interesting)

Anonymous Coward | more than 4 years ago | (#30322902)

I think the problem is more around silent (passive) data corruption and loss.

It does become an interesting exercise when you are dealing with "off-line" type media like tape, DVD and rocks though - the greater the data density the greater the impact of media damage (entropy).

So I guess there are two parts to this problem - how often to you validate your data, and how do you mitigate large scale errors. There are good solutions at the on-line media level (ZFS/RAID, etc..), but relatively weak at the offline level - anyone know of the equivalent RAID model for things like tape?

Cheers,

-I.

Re:What about the "block errors"? (1, Insightful)

Anonymous Coward | more than 4 years ago | (#30323010)

...two tapes.

Re:What about the "block errors"? (3, Informative)

tepples (727027) | more than 4 years ago | (#30323264)

anyone know of the equivalent RAID model for things like tape?

Four tapes data, one tape PAR2.

Re:What about the "block errors"? (1)

careysb (566113) | more than 4 years ago | (#30322986)

RAID 0 does not offer "block redundancy". If I have old-fashioned backups, how can I determine that my primary file has sustained damage? The OS doesn't tell us. In fact, the OS probably doesn't even know. Backup software often allows you to validate the backup immediately after the backup was made, but not, say, a year later. The term "checksum" is over-used. A true check-sum algorithm *may* tell you that a file has sustained damage. It will not, however, tell you how to correct it. A "CRC" (cyclic-redundancy-check) or "SHA" (secure hashing algorithm) have a better chance for flagging a damaged file than a check-sum. The only "correction" algorithm that I've tripped across (and there are probably others) is "Reed-Solomon".

Re:What about the "block errors"? (1)

Prof.Phreak (584152) | more than 4 years ago | (#30323082)

It means we need error correction at every level---error correction at physical device (already in place, more or less) and error correction at file system level (so even if a few blocks from a file are missing, the file system auto-corrects itself and still functions---upto some point of course).

About time (3, Interesting)

trydk (930014) | more than 4 years ago | (#30322802)

It is about time that somebody (hopefully some of the commercial vendors AND the open source community too) get wise to the problems of digital storage.

I always create files with unique headers and consistent version numbering to allow for minor as well as major file format changes. For storage/exchange purposes, I make the format expandable where each subfield/record has an individual header with a field type and a length indicator. Each field is terminated with a unique marker (two NULL bytes) to make the format resilient to errors in the headers with possible resynchronisationthrough the markers. The format is in most situations backward compatible to a certain extent as an old program can always ignore fields/subfields it does not understand in a newer format file. If that is not an option, the major version number is incremented. This means that a version 2.11 program can read a version 2.34 file with only minor problems. It will not be able to write to that format, though. The same version 2.11 program would not be able to correctly read a version 3.01 file either.

I have not implemented ECC in the formats yet, but maybe the next time I do an overhaul ... I will have to ponder that. Maybe not, my programs seem to ephemeral for that ... Then again, so did people think about their 1960es COBOL programs.

Lossy (2, Insightful)

FlyingBishop (1293238) | more than 4 years ago | (#30322826)

The participant asked why digital file formats (jpg, mpeg-3, mpeg-4, jpeg2000, and so on) can't allow the same degradation and remain viewable.

Because all of those are compressed, and take up a tiny fraction of the space that a faithful digital recording of the information on a film reel would take up. If you want lossless-level data integrity, use lossless formats for your masters.

Do not compress! (2, Interesting)

irp (260932) | more than 4 years ago | (#30322832)

... Efficiency is the enemy of redundancy!

Old documents, saved in 'almost like ascii' is still 'readable'. I once salvaged a document from some obscure ancient word processor by opening it in a text editor. I also found some "images" (more like icons) on the same disk (a copy of a floppy), even these I could "read" (by changing the page width of my text editor to fit the width of the uncompressed image).

As long as the storage space keep growing...

Very, very old news.... (2, Informative)

gweihir (88907) | more than 4 years ago | (#30322856)

It has been done like that for decades. Look at what archival tape does or DVDisaster or modern HDDs.

Also, this does not solve the problem, it just defers it. Why is this news?

How does this protect existing data files? (0)

Anonymous Coward | more than 4 years ago | (#30322858)

Oh, yeah. It doesn't.

And who gets to pay for existing apps to be rewritten?

Also, Bittorrent (4, Informative)

NoPantsJim (1149003) | more than 4 years ago | (#30322862)

I remember reading a story of a guy who had to download a file from Apple that was over 4 gigabytes, and had to attempt it several times because each came back corrupted due to some problem with his internet. Eventually, he gave up and found the file on bit torrent, but realized if he saved it in the same location as the corrupted file, it would check the file and then overwrite it with the correct information. He was able to fix it in under an hour using bittorrent rather than trying to re-download the file while crossing his fingers and praying for no corruption.

I know it's not a perfect example, but just one way of looking at it.

Re:Also, Bittorrent (1)

ShadowRangerRIT (1301549) | more than 4 years ago | (#30323034)

Of course, if a block level hash for the .torrent file had been corrupted when he downloaded that, we'd be back to square one.

Re:Also, Bittorrent (1)

NoPantsJim (1149003) | more than 4 years ago | (#30323090)

True, but at least it's worth considering.

Wait, I know! We'll make a torrent for the .torrent file! Genius!

Re:Also, Bittorrent (1)

maxume (22995) | more than 4 years ago | (#30323228)

Bittorrent doesn't really make use of error correction, it uses error detection, and when it finds an error (a hash mismatch), it just goes and downloads another copy of that data (of course, the error does get corrected, but it requires that someone be seeding a good copy of the data).

From the perspective of a user downloading a large file, just downloading the bad parts of the file is a huge improvement over stuff like ftp or http, but without the network, it doesn't help in recovering data.

Quickpar... (1)

blahplusplus (757119) | more than 4 years ago | (#30322912)

Quite frankly data is so duplicated today bit-rot is not really an issue if you know what tools to use, especially if you use tools like quickpar on important data that can handle bad blocks.

Much data is easily duplicated, the data you want to save if it is important should be backed up with care.

Even though much of the data I download is easily downloaded again, the stuff I want to keep I quickpar the archives and burn to disc, and really important data that is irreplacable I make multiple copies.

http://www.quickpar.co.uk/ [quickpar.co.uk]

Re:Quickpar... (1)

Bobberly (1677220) | more than 4 years ago | (#30322980)

I was thinking the exact same thing. It works for Usenet quite well and I've even started including PAR files on any optical media I use for backup. The overhead is negligible giving how cheap storage is getting.

Re:Quickpar... (0)

Anonymous Coward | more than 4 years ago | (#30323178)

Can you (or anyone else) share something more like block sizes you use, trick, things you found out and so on? Maybe some good articles/posts?

I would also like to find more info about using RAR recovery blocks (again things like whether to split archives on more files and so on) and more in general any good and simple backup solution on Windows that use parity blocks.

Pretty sad that we had good solutions more than a decade ago http://www.hugolyppens.com/VBS.html [hugolyppens.com] and yet today is hard to find a simple backup software that includes parity (both in the sense of adding parity and giving you the option to verify the set).

Parchive: Parity Archive Volume Set (5, Interesting)

khundeck (265426) | more than 4 years ago | (#30322938)

Parchive: Parity Archive Volume Set

It basically allows you to create an archive that's selectively larger, but contains an amount of parity such that you can have XX% corruption and still 'unzip.'

"The original idea behind this project was to provide a tool to apply the data-recovery capability concepts of RAID-like systems to the posting and recovery of multi-part archives on Usenet. We accomplished that goal." [http://parchive.sourceforge.net/]

KPH

Solution: (2, Insightful)

Lord Lode (1290856) | more than 4 years ago | (#30322960)

Just don't compress anything, if a bit corrupts in a non compressed bitmap file or in a plain .txt file, no more than 1 pixel or letter is lost.

Re:Solution: (1)

Gothmolly (148874) | more than 4 years ago | (#30323202)

Not if that bit flips the ASCII value of some element in an XML file, rendering the file unparsable (by a computer at least).

Re:Solution: (1)

Lord Lode (1290856) | more than 4 years ago | (#30323286)

An XML file with a flipped bit can be restored just as well by a human being, than a painting of Rubens that is damaged a bit.

New versions of ISO, ZIP and Truecrypt for this? (1)

Brit_in_the_USA (936704) | more than 4 years ago | (#30322972)

I would like this. Some options I could work with: Extensions to current CD/DVD/Bluray ISO formats, new version of "ZIP" files and a new version of True Crypt files.
If done in an open standards way I could be somewhat confident of support in many years time when I may need to read the archives. Obviously backwards compatibility with earlier iso/file formats would be a plus.

Film and digital (2, Interesting)

CXI (46706) | more than 4 years ago | (#30322976)

Ten years ago my old company used to advocate that for individuals who wanted to convert paper to digital, they first put them on microfilm and then scan them. That way when their digital media got damaged or lost they could always recreate it. Film last for a long long time when stored correctly. Unfortunately that still seems the be the best advice, at least if you are starting from an analog original.

Don't worry (0)

Anonymous Coward | more than 4 years ago | (#30323026)

Quantum computers will save us. They could examine every combination possible to rebuild a file in seconds.

Re:Don't worry (1)

plastbox (1577037) | more than 4 years ago | (#30323250)

Finally! I thought I was the only one thinking about this! With enormous amounts of computing power (as quantum computing promises) one could easily store, say, a blu-ray movie by calculating and storing a few different hashes (MD5, SHA1, SHA2, etc.), then brute-force them until you have the original file back. Imagine that.. transmitting a 20GB HD movie in a single SMS. o.O

Use TCP/IP (1)

spacemky (236551) | more than 4 years ago | (#30323068)

I'd recommend just NOT using X-MODEM or Z-MODEM! Bit errors everywhere. Especially when mom picks up the telephone! ggggrrrrrrrrrrrrrrrr

Cloud computing provides an opportunity (3, Funny)

davide marney (231845) | more than 4 years ago | (#30323120)

As we're on the cusp of moving much of our data to the cloud, we've got the perfect opportunity to improve the resilience of information storage for a lot of people at the same time.

Forward-error correction instead (1)

kriston (7886) | more than 4 years ago | (#30323218)

I believe that Forward-error correction is an even better model. Already used for error-free transmission of data over error-prone links in radio, and USENET using the PAR format, what better way to preserve data than with FEC?
Save your really precious files as Parchive files (PAR and PAR2). You can spread them over several discs or just one disc with several of the files on it.

It's one thing to detect errors, but it's a wholly different universe when you can also correct them.

http://en.wikipedia.org/wiki/Parchive [wikipedia.org]

Linearity is the real problem (2, Insightful)

designlabz (1430383) | more than 4 years ago | (#30323238)

Problem is not in error correction, but actually in linearity of data. Using only 256 pixels you could represent an image brain can interpret. Problem is, brain can not interpret an image form first 256 pixels, as that would probably be a line half long as the image width, consisting of mostly irrelevant data.
If I would want to make a fail proof image, I would split it to squares of, say, 9(3x3) pixels, and than put only central pixel(every 5th px) values in byte stream. Once that is done repeat that for surrounding pixels in the block. In that way, even if part of data is lost, program would have at least one of the pixels in a 9x9 block and it could use one of nearby pixels as a substitute, leaving up to person to try and figure out the data. You could repeat subdivision once again, achieving pseudo random order of bytes.
And this is just a mock up of what could be done to improve data safety in images without increasing the actual file size.
In old days of internet, designers were using images in lower resolution, to lower page loading time, and than gradually exchanging images with higher res versions once those loaded. If it had sense to do it then, maybe we could now use integrated preview images to represent the average sector of pixels in the image, and than reverse calculate missing ones using pixels we have.
This could also work for audio files, and maybe even archives. I know I could still read the book even if every fifth letter was replaced by a incorrect one.

Cheers,
DLabz

Re:Linearity is the real problem (1)

brusk (135896) | more than 4 years ago | (#30323356)

I know I could still read the book even if every fifth letter was replaced by a incorrect one.

That would depend on the book. Your brain could probably error-correct a novel easily enough under those conditions (especially if it was every fifth character, and not random characters at a rate of 20%). But I doubt anyone could follow, say, a math textbook with that many errors.

Use small hunks and do the checksum thing (1)

davidwr (791652) | more than 4 years ago | (#30323300)

The in-file checksum thing is a good idea, but it may be redundant to disk- or filesystem-level checksums.

Another useful thing is to store information in "chunks" so that if a bit goes bad no more than one "chunk" is lost. A chunk could be a pixel or group of pixels in certain graphics formats, a page, in certain "page" formats such as PDF or multi-page TIFF, a cell in a spreadsheet, a maximum-length run of characters in a word-processing document, etc.

Storing files in "ascii-like" formats where it makes sense to do so is also a good idea from a data-recovery perspective.

For files that represent "events in time" such as music, video, or some scientific data collections, a "chunk" might be a second or some other period of time.

Many of today's data formats already operate at a "chunk" level. Many do not.

On another note, these days, "space is cheap" on disk, but not necessarily so when it comes to networking or the time it takes to make backups. 1TB is under $100 on a home machine, several times that on a server, a relative pittance over the life of the drive. However, copying 1 TB takes a non-trivial amount of time.

Incorrect... (0)

Anonymous Coward | more than 4 years ago | (#30323442)

In the context of Hard Drives. Hard Drives are NOT digital.
They store a value via "majority". If that bit is overwritten, it can easily be recovered with the right software, several times and you are maybe needing some specific hardware.

Why don't OSes (and manufacturers) take advantage of this? There are effectively 2 layers per disc you could use to store data without degradation.
As long as the disc is kept in good condition, you can use this extra layer.
Instead, what we have are companies squeezing sectors closer together and making this method unreliable the higher the density.
Stop treating a magnetic disc as an optical disc, you can store much more on it.
This could drop the cost of drives significantly and still retain the same size as we currently have. (1gig, 2 gigs at a stretch but i still wouldn't risk it)

how many levels do you need? (1)

pydev (1683904) | more than 4 years ago | (#30323446)

There are several levels of error correction at the disk level, plus at the RAID level, plus possibly at the file system level. And the whole thing has been wrapped up so well that users don't have to worry about. If users are still getting bit errors, someone hasn't been paying attention to their SMART, RAID, and file system logs.

No amount of error correction will protect you from that; sooner or later, disks go bad, and you have to replace them before there are too many errors for the system to recover.

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...