Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

One Developer's Experience With Real Life Bitrot Under HFS+

timothy posted about a month ago | from the so-really-it's-both-plus-and-minus dept.

Bug 396

New submitter jackjeff (955699) writes with an excerpt from developer Aymeric Barthe about data loss suffered under Apple's venerable HFS+ filesystem. HFS+ lost a total of 28 files over the course of 6 years. Most of the corrupted files are completely unreadable. The JPEGs typically decode partially, up to the point of failure. The raw .CR2 files usually turn out to be totally unreadable: either completely black or having a large color overlay on significant portions of the photo. Most of these shots are not so important, but a handful of them are. One of the CR2 files in particular, is a very good picture of my son when he was a baby. I printed and framed that photo, so I am glad that I did not lose the original. (Barthe acknowledges that data loss and corruption certainly aren't limited to HFS+; "bitrot is actually a problem shared by most popular filesystems. Including NTFS and ext4." I wish I'd lost only 28 files over the years.)

cancel ×

396 comments

I've also had this happen with HFS+ (4, Informative)

carlhaagen (1021273) | about a month ago | (#47235971)

An old partition of some 20000 files, most of them 10 years or older, in where I found 7 or 8 files - coincidentally jpg images as well - that were corrupted. It struck me as nothing other than filesystem corruption as the drive was and still is working just fine.

Re:I've also had this happen with HFS+ (4, Insightful)

istartedi (132515) | about a month ago | (#47236091)

coincidentally jpg images as well

Well, JPGs are usually lossy and thus compressed. Flipping one bit in a compressed image file is likely to have severe consequences. OTOH, you could coXrupt a fewYentire byteZ in an uncompressed text file and it would still be readable. I suspect your drives also had a few "typos" that you didn't notice because of that.

Legacy file systems should be illegal (1)

Anonymous Coward | about a month ago | (#47235977)

We know how to build good file systems. We have done it for years with ZFS and now Btrfs. Sticking to legacy file systems which are prone to corruption is simply not acceptable. It is about time that legislative authorities makes it illegal for Apple and other negligent vendors to ship file systems that are essentially faulty by design. A noticeable fine per corrupted file would be appropriate, with possibility of prison time upon recurring incidents.

Re:Legacy file systems should be illegal (2, Informative)

Anonymous Coward | about a month ago | (#47236003)

The problem is, neither ZFS or Btrfs would have stopped an arbitrary bit inside an arbitrary file from becoming corrupt if the disk failed to write it or read it correctly. Only multiple disks and redundancy would have solved that.

Re:Legacy file systems should be illegal (5, Insightful)

kthreadd (1558445) | about a month ago | (#47236031)

At least you would know that the file was corrupted, so that you could restore it from a good backup.

Re: Legacy file systems should be illegal (0)

Anonymous Coward | about a month ago | (#47236199)

In sure those backups won't suffer bit rot either.

But seriously if you do make backups, do an incremental and then a full every so often. A full backup every month, incremental every day and retain backups for a full year. So every year you have 12 full sets and then simply overwrite/delete the oldest set as you go.

Re: Legacy file systems should be illegal (0)

Anonymous Coward | about a month ago | (#47236279)

In sure those backups won't suffer bit rot either.

But seriously if you do make backups, do an incremental and then a full every so often. A full backup every month, incremental every day and retain backups for a full year. So every year you have 12 full sets and then simply overwrite/delete the oldest set as you go.

Arrgh. It ain't that simple. I hope to hell you're not backing up a filesystem used to hold a large PostgreSQL database like that. Your incrementals will be effectively as large as the full backups. Imagine how long it'd take you to restore near the end of a month. When your full backups are 100 GB and your "incremental" backups are 99 GB, might as well take a full backup every night.

FWIW, 100 GB backups are SMALL.

Re: Legacy file systems should be illegal (5, Interesting)

aix tom (902140) | about a month ago | (#47236375)

A database is something special

I basically make a "full backup" of my Oracle DBs once a week, and a "incremental backup" in the form of DB change logs every five minutes. (that is, the change logs are pushed "off site" every five minutes, of course they are being written locally continuously with every change.

The thing with backups, though, is not only to make them often but to also *check* them often. With my DBs there is a handy tool where I can check the backup files for "flipped bits" because there are also checksums in the DB files.

For my "private backups to DVD/BR" I only fill them up to ~70%, and fill the rest of the disk with checksum data with dvdisaster. [dvdisaster.net] , for other "online backups" I create PAR2 files that I also store. With those parity files I can check "are all bits still OK?" now and then, and repair the damage when/if bits start to rot in the backup. In the 10 years I do this, with ~150 DVDs and ~20BRs so far I had 2 DVDs that became "glitchy", but because of the checksum data I was able to repair the ISO and re-burn them.

Basically, IF you go through the trouble of setting up an automated backup system either with software or with your own scripts, It doesn't add much work to also add verification/checksum data to the backup. And that goes a long way into preventing data loss due to bit rot.

Re:Legacy file systems should be illegal (3, Interesting)

Chandon Seldon (43083) | about a month ago | (#47236035)

Btrfs (at least) can store multiple copies on one disk and use a checksum to identify the good copy to read. Obviously more disks is better, but...

Re:Legacy file systems should be illegal (0)

Anonymous Coward | about a month ago | (#47236059)

You mean... just like HFS+ can?

For all that HFS+ is a crufty old tacked on, hacked about mess, it actually has a surprising number of modern features you would expect to find in ZFS and Btrfs like file systems, for example, compression, deduplication, and redundancy support.

I wouldn't be at all surprised if copy on write semantics was added at some point soon too.

Re:Legacy file systems should be illegal (4, Informative)

mgmartin (580921) | about a month ago | (#47236083)

As does zfs: man zfs
copies=1 | 2 | 3 Controls the number of copies of data stored for this dataset. These copies are in addition to any redundancy provided by the pool, for example, mirroring or RAID-Z. The copies are stored on different disks, if possible. The space used by multiple copies is charged to the associated file and dataset, changing the used property and counting against quotas and reservations. Changing this property only affects newly-written data. Therefore, set this property at file system creation time by using the -o copies=N option.

Re:Legacy file systems should be illegal (5, Insightful)

jbolden (176878) | about a month ago | (#47236045)

Yes absolutely great idea! Rather than having technical decisions being made at tech conferences and among developers, system administrators and analysts we should move that authority over to legislature. Because we all know we are going to see a far better weighing of the costs and benefits of various technology choices by the legislature than by technology marketplace.

Apple used HFS+ because it worked to successfully migrate people from Mac OS9, it supported a unix / MacOS hybrid. They continue to use it because it has been good enough and many of the more robust filesystems were pretty heavyweight. I'd like something like BTFS too. But I don't think the people who disagree with me should be jailed.

Re:Legacy file systems should be illegal (1)

Anonymous Coward | about a month ago | (#47236163)

The op's position is obviously absurd, but seriously how well is this tech conference decision thing going? We have like what, just a small number of even FOSS operating systems have modern file systems. I wouldn't mind Apple being one of the larger operating systems vendors being kicked in the butt on this. Even Microsoft deserves its share of shame on this.

Re:Legacy file systems should be illegal (-1)

Anonymous Coward | about a month ago | (#47236067)

That would mean using a modern OS. People made a fuss about Windows XP going EOL. You do the math.

Re: Legacy file systems should be illegal (0)

Anonymous Coward | about a month ago | (#47236143)

Not sure if stupid or something else.

Changing the default file system doesn't require a different OS.

Re: Legacy file systems should be illegal (3, Informative)

the_B0fh (208483) | about a month ago | (#47236357)

Not if your OS is tied intimately to your filesystem. Linux might not, because a large number of things are abstracted out, but FreeBSD depends on its file system, Solaris took a very long time/effort before it could boot off ZFS. Forget about moving Windows off NTFS. Apple actually did some work on putting it onto ZFS, maybe they will continue.

Re:Legacy file systems should be illegal (4, Interesting)

peragrin (659227) | about a month ago | (#47236113)

This is something everyone forgets.
It takes decades to build long term reliable file systems.

ZFS, BTFS, are less than a decade old.
Windows runs on NTFS Version something. NTFS was started in what year?
HFS, and then HFS + was built in what year?
How long has Microsoft been promising WinFS?

File systems change but only slowly. This is good. you need a good long track record to convince people they won't lose files every ten years due to random malfunctions.

Re:Legacy file systems should be illegal (0)

Anonymous Coward | about a month ago | (#47236263)

And then there was ReiserFS, which was really gifted at deleting the data, pretending complete innocence, burying the disk drive and pretending it had run off to Russia, and trying to hide the blood soaked screwdrivers in the woods to recover them when the police aren't looking.

            http://en.wikipedia.org/wiki/Hans_Reiser

Re:Legacy file systems should be illegal (0)

Anonymous Coward | about a month ago | (#47236277)

"But he is one of us!"

Haha

Re:Legacy file systems should be illegal (0)

Anonymous Coward | about a month ago | (#47236287)

Yep. IIRC one of the early OSXs(10.1? 10.2?) offered the option to installing using ZFS, so I tried it on an older machine and it just broke so much of OS X although I can't recall at this point IF it was the OS or 3rd party stuff...

I was disappointed and had meant to try it out again later(probably was 10.2 then because...) it was not too long after this that I was getting severely tires of $130 yearly "upgrades", elimination of so many options to make at least powermacs cheaper(e.g. stripping ram, drives, etc. which would leave you with a good price as apple was charging insanely HIGH prices for RAM & hdds & GPUs back then usually anywhere from 2.5-10x market rate), and motorola et. al. were unable to cough up decent processor upgrades. So, it was back primarily to the land of unix/windows/*BSD for me.

Even my older powerbooks once 10.2 got so quickly deprecated ended up running Yellow Dog linux and got snappier again...

Re:Legacy file systems should be illegal (1)

the_B0fh (208483) | about a month ago | (#47236373)

Bullshit. I was running up through 10.4 or 10.5 on a PowerMac G4 450mhz. It was more responsive (I didn't say faster) and usable than the dual 1.4Ghz Pentium 3 I had.

Re:Legacy file systems should be illegal (1)

Anonymous Coward | about a month ago | (#47236423)

I had an iBook G4 1.2 GHz and 10.5 was the most sluggish thing I've ever seen. I later downgraded back to 10.4 until I retired the hardware.

Re:Legacy file systems should be illegal (1)

nine-times (778537) | about a month ago | (#47236371)

How long has Microsoft been promising WinFS?

I thought Microsoft gave up on WinFS. Are they still promising it?

Re:Legacy file systems should be illegal (0)

Anonymous Coward | about a month ago | (#47236289)

You realize it's physically inherently impossible to ensure 100% reliable bit transmission, yes? You can push arbitrarily close though.

What I hear from this guy... (0)

Anonymous Coward | about a month ago | (#47235979)

1. All filesystems and/or storage hardware are imperfect - and what do you mean by "integrity check"?! This thing has ROUNDED CORNERS;

2. I don't back up my data adequately;

3. For some reason, 1 is more important than 2.

Backup? (3, Insightful)

graphius (907855) | about a month ago | (#47235983)

shouldn't you have backups?

Re:Backup? (4, Insightful)

kthreadd (1558445) | about a month ago | (#47236023)

The problem with bit rot is that backups doesn't help. The corrupted file go into the backup and eventually replace the good copy depending on retention policy. You need a file system which uses checksums on all data block so that it can detect a corrupted block after reading it, flag the file as corrupted so that you can restore it from a good backup.

Re:Backup? (5, Insightful)

dgatwood (11270) | about a month ago | (#47236139)

Depends on the backup methodology. If your backup works the way Apple's backups do, e.g. only modified files get pushed into a giant tree of hard links, then there's a good chance the corrupted data won't ever make it into a backup, because the modification wasn't explicit. Of course, the downside is that if the file never gets modified, you only have one copy of it, so if the backup gets corrupted, you have no backup.

So yes, in an ideal world, the right answer is proper block checksumming. It's a shame that neither of the two main consumer operating systems currently supports automatic checksumming in the default filesystem.

Re:Backup? (1, Informative)

Antique Geekmeister (740220) | about a month ago | (#47236285)

The bitrot will change the checksums and cause the files to show up as modified.

Moreover, what will you do about a reported bitrotted file unless you have genuine archival backups somewhere else?

Re:Backup? (0)

Anonymous Coward | about a month ago | (#47236345)

The bitrot will change the checksums and cause the files to show up as modified.

not really -- unless the bitrot changes the file's modifcation time it won't be detected as changed. the backup doesn't do a checksum of the file to compare to the backup... that would require reading all the data on the drive for every backup and would take more than a day for a daily backup.

Re:Backup? (2)

nine-times (778537) | about a month ago | (#47236381)

If I remember correctly, that's not how Apple's current backup system works. Every time a file gets written to, there's a log someplace that records that the file was modified. Next time Time Machine runs, it backs up the files in that log. If the OS didn't actually modify the file, it won't get backed up.

I may be wrong, but that's how I understood it.

Re:Backup? (0)

Anonymous Coward | about a month ago | (#47236049)

From how far back to you propose the backup will be retrieved? Even tape media kept in perfect conditions may be storing the degraded copy of the file if the rot occurred before the backup was taken.

The critical point to remember is that bitrot occurs without being noticed. It could be a single bit that flips or an entire block of a file, but the file systems still says "all is happy here... move along!"

Backup tools do not detect this event. File systems like ZFS and BTRFS work to prevent bitrot, but there are debates as to if they actually do that job.

Re:Backup? (2)

ZosX (517789) | about a month ago | (#47236057)

This is a good idea, but not a solution. Often you have no idea that the file is bad until after the fact, in this case years later. I've had mp3 collections get glitches here and there after a few copies from various drives. If you have no idea the data is bad in the first place, your backup of the data isn't going to be any better. I would say that all of my photography I've collected over the years has stayed readable somehow. I do check in lightroom every once in a while, but I wouldn't be shocked to find a random unreadable file. Not good really, but there's probably not much I can do other than make sure that my files are verifiable.

Re:Backup? (1)

ColdWetDog (752185) | about a month ago | (#47236101)

I have close to 4 terabytes of photography and video stored (not that kind of photography and video). I, too, have seen occasional unreadable files, typically in JPEGS but also an occasional TIFF file. Any compressed container (like a JPEG) is going to be more susceptible to this issue thus JPEGs aren't a great storage format. Video files are harder to figure - a corrupted bit could easily get overlooked.

I've never actually lost a picture that I was interested in - I always have more than one copy of the image on the disk - a TIFF and a RAW file typically. Yes, it would be nice if the file system didn't do that. No, I don't think I would believe anybodies claim that that would indeed happen. Further, it's always a risk - benefit calculation. You can spend a lot more money getting near perfect replication but I don't think many people are willing to have a system with ECC memory throughout the chain.

Re:Backup? (2)

rnturn (11092) | about a month ago | (#47236225)

Even if you did have backups how could you even begin to know which saveset to restore from? You could have been backing up a corrupted file for a lo-o-ong time.

Friends wonder why I still purchase physical books and CDs. This is why. I'll have to come up with a simple 2-3 sentence explanation of the problem the OP was describing for when they ask next time. I've had MP3 files made from my CD collection mysteriously become corrupted over time. No problem, I can just re-rip/convert/etc. but losing the original digital version of your newborn would be heartbreaking. Make several copies to reduce the odds of losing it. Make a good print using archival paper and inks and keep in away from light in a safe deposit box so it could be rescanned should the digital file become corrupted. Of course, one can go overboard as not every photo is worth that kind of effort but it appears we might be starting to see, first-hand, the problems described in Bergeron's "Dark Ages II". Even worse what if this [consortiuminfo.org] were to happen? (So don't even bring up the "cloud", OK?)

It's the contents of the files... (0)

Anonymous Coward | about a month ago | (#47235987)

It's the contents of the files that are corrupted. This has nothing to do with HFS+, and everything to do with a lack of redundancy, and a bad hard drive.

Re:It's the contents of the files... (1)

kthreadd (1558445) | about a month ago | (#47236037)

The point is that there are good file systems that can detect when the storage unit fails, give you an alert and allow you to restore the file from a good backup. Without this feature the corrupted file will just get backed up like any other file and eventually replace the good backup.

Re:It's the contents of the files... (0)

Chandon Seldon (43083) | about a month ago | (#47236053)

It turns out that you can have this problem even with a RAID. I'm running RAID-1 with 3 disks for my long term storage, and I need to move it from ext4 to btrfs at some point to avoid the failure case where it selects a bitrotted copy to read from. It would be nice if the RAID layer were smart enough to use the matching two of three, but that would make reads slower...

Bitrot not the fault of filesystem (5, Insightful)

Gaygirlie (1657131) | about a month ago | (#47236005)

Bitrot isn't the fault of the filesystem unless something is badly buggy. It's the fault of the underlying storage-device itself. Attacking HFS+ for something like that is just silly. Now, with that said there are filesystems out there that can guard against bitrot, most notably Btrfs and ZFS. Both Btrfs and ZFS can be used just like a regular filesystem where no parity-information or duplicate copies are saved and in such a case there is no safety against bitrot, but once you enable parity they can silently heal any affected files without issues. The downside? Saving parity consumes a lot more HDD-space, and that's why it's not done by default by most filesystems.

Re:Bitrot not the fault of filesystem (0)

Anonymous Coward | about a month ago | (#47236065)

"Attacking HFS+ for something like that is just silly." TFP says they're certainly not limited to HFS. The point was it did occur under HFS also, it's not immune.

Re:Bitrot not the fault of filesystem (1)

jbolden (176878) | about a month ago | (#47236071)

There is a 3rd possibility. As the size of the dataset increases you can construct a more complex error correcting code on that dataset with loss of spacing being 1/n. Note that's essentially saving information about the decoding and then the coded information, sort of like how compression works. Which for most files would be essentially free. And of course you could combine with this compression by default which might very well result in a net savings. But then you pick up computational complexity. With extra CPUs though having a CPU (or hardware in the drive) dedicated to handling that isn't unreasonable.

Re:Bitrot not the fault of filesystem (0)

Anonymous Coward | about a month ago | (#47236131)

And of course you could combine with this compression by default which might very well result in a net savings.

Uhmmm, all common formats for large files are already compressed pretty effectively, so no you could add compression and get net savings.

reading checksum + n blocks is SLOW (1)

raymorris (2726007) | about a month ago | (#47236169)

It's not a matter of CPU load. Suppose you have one checksum block for every eight data blocks. In order to verify the checksum on read, you have to read the checksum block and all eight data blocks. So you have to read a total of nine blocks instead of one. Reading from the disk is one if the slowest operations in a computer, so ddoing it nine times instead of one slows things down considerably.

Re:reading checksum + n blocks is SLOW (2)

jbolden (176878) | about a month ago | (#47236349)

You don't have checksum blocks in the space efficient method. Rather in the computational way I'm talking about it is a transformation. You might have something like every 6354 bits becomes 6311 bits after the complex transformation. It doesn't slow down the read but you have to do math.

Re:Bitrot not the fault of filesystem (0)

Gaygirlie (1657131) | about a month ago | (#47236417)

Note that's essentially saving information about the decoding and then the coded information, sort of like how compression works. Which for most files would be essentially free.

No, you are talking out of your ass there. First of all, if the system worked like you explain it then having the decode - block itself get corrupted would render every single file relying on it invalid, so you'd still end up having to maintain at least a second copy of the decode - block and checksums for them both, but you'd still have two points of breakage that, if ever corrupted, would still render everything corrupted. That's really shitty design. On a similar note, maintaining such a decode - block is far from being "free" -- try compressing e.g. a 2-hour movie and you'll notice that most likely it only just got slightly bigger, not smaller. Then do the same while you enabled recovery mode, ie. the compression system writes a second decode - block in the file, and the file will certainly get even bigger. Go on, try it, you'll see.

Re: Bitrot not the fault of filesystem (4, Insightful)

jmitchel!jmitchel.co (254506) | about a month ago | (#47236073)

Even with just checksums, knowing that there is corruption means knowing to restore from backups. And in the consumer space most people have plenty of space to keep parity if it comes to that.

Re:Bitrot not the fault of filesystem (0)

Anonymous Coward | about a month ago | (#47236295)

In fact google has run into this problem. Harddrives do checksums, but they still found bits flipped that gave the checksum and therefor not detected as an error.
google uses a lot of harddrives.

Re:Bitrot not the fault of filesystem (0)

Anonymous Coward | about a month ago | (#47236363)

Which is why there is Raid 5 with double parity...

And when done in hardware, there is no delay in reading the data, nor in the correction. Raid management can be used to detect "prefailures" where the bit error rate gets too large... and requests a replacement before actual failure.

Re:Bitrot not the fault of filesystem (0)

Anonymous Coward | about a month ago | (#47236369)

More likely it is bad RAM, and not using ECC RAM. Or flaky power supply. Maybe in the process of copying the file or saving it the data buffer got corrupted.

The hard disk (or CD or DVD, OP does not mention at all what type of physical storage he is using) has built-in error correction so the data shouldn't be easily corrupted.

Re:Bitrot not the fault of filesystem (1)

Gaygirlie (1657131) | about a month ago | (#47236389)

More likely it is bad RAM, and not using ECC RAM. Or flaky power supply. Maybe in the process of copying the file or saving it the data buffer got corrupted.

The hard disk (or CD or DVD, OP does not mention at all what type of physical storage he is using) has built-in error correction so the data shouldn't be easily corrupted.

Mmmmno. If he had bad RAM he would be having a lot more issues with the system than just 28 broken files over 6 years. And no, HDDs do not have built-in error correction, they have checksums -- those things are not the same thing.

Good backups aren't enough (1)

jmitchel!jmitchel.co (254506) | about a month ago | (#47236043)

Good backups aren't enough. If the filesystem isn't flagging corruption as it happens, the backup software will happily back up your corrupted data over and over until the last backup which has the valid file in it has expired or become unrecoverable itself.

Re:Good backups aren't enough (1)

drew_92123 (213321) | about a month ago | (#47236097)

I copy all of my non-changing files to a a special directory and have the backup app I use compare them to another copy and alert me of any changes. At any one time I have 3-4 copies of my important files on separate disks PLUS my backups. Cuz fuck losing files! ;-)

ZFS, Apple! (2)

grub (11606) | about a month ago | (#47236047)


This is why Apple should resurrect its ZFS project. Overnight they would be the largest ZFS vendor to match with being the largest UNIX vendor.

Re:ZFS, Apple! (0)

Anonymous Coward | about a month ago | (#47236081)

They would also be sued pretty quickly by Oracle. Clearly not an option.

Re:ZFS, Apple! (2)

grahamsaa (1287732) | about a month ago | (#47236167)

I'm not sure this is true. Other vendors like iXsystems already sell products that ship with ZFS. As I understand it, ZFS is BSD licensed. While Oracle distributes its own version of ZFS that may (or may not) include proprietary features, the open sourced version is freely distributable. The only reason it's packaged as a userland utility for Linux is that the BSD license isn't compatible with the kernel's GPL license. Apple's kernel is definitely not GPL, so this isn't a problem for them.

One problem might be that using ZFS without ECC memory can result in data loss, and ECC memory is more expensive (and not compatible with most consumer oriented processors that Intel makes). This would increase the cost of Apple hardware and could (possibly) be a hurdle, as Intel doesn't want to support ECC memory on their consumer oriented processors (as this could hurt sales of more expensive server-oriented processors. But Apple is a large enough vendor that they could probably negotiate something with Intel that could be workable.

That said, I don't know many Apple users that know what ZFS is, and it doesn't seem like there are many people clamoring for it. It would be a great addition to OSX though.

Re:ZFS, Apple! (1)

kthreadd (1558445) | about a month ago | (#47236205)

ZFS does not require ECC memory more than any other file system. I have no idea where you got that from.

Re:ZFS, Apple! (1)

grahamsaa (1287732) | about a month ago | (#47236271)

Of course it doesn't, and I never said that. But your chances of data corruption if you use ZFS without ECC are somewhat greater, and potentially much more catastrophic. A web search for 'ZFS without ECC' will point you to a number of horror stores. Basically, ZFS always trusts what's in memory, so if what's in memory differs from what's on disk, the contents on disk get overwritten. If this discrepancy is due to bit rot, that's great -- you've just saved your data. But if it's due to a memory error, your system proactively corrupts your data. Considering that most non ECC DIMMs have a couple errors a year, you will very likely lose data if you run ZFS on a system without ECC.

Of course, ECC doesn't fix everything, but it should halt your system if your RAM has an uncorrectable error, which is better than corrupting your files on disk.

Re:ZFS, Apple! (1)

kthreadd (1558445) | about a month ago | (#47236391)

I see what you mean now, but I must say that I really don't agree with these non-ECC horror stories. You have much bigger problems if you have memory corruption.

Re: ZFS, Apple! (0)

Anonymous Coward | about a month ago | (#47236171)

Oracle and Apple are allies. Apple is a huge Oracle customer. Steve Jobs and Larry Ellison were BFFs.

Re: ZFS, Apple! (0)

Anonymous Coward | about a month ago | (#47236215)

And how exactly did that end up for Steve Jobs?

Re:ZFS, Apple! (2)

Calibax (151875) | about a month ago | (#47236213)

No they would not be sued by anyone.

Sun open sourced ZFS under a permissive license. Oracle close sourced it again. However, a number of companies are supporting derivatives of the open source version.

ZFS is available for a number of operating systems today. A non-inclusive list:
FreeBSD from iXsystems
Linux from Lawrence Livermore National Laboratory and also Pogo Linux
SmartOS from Joyent
OmniOS from Omniti
Osv from CloudOS

In addition a number of companies are using ZFS in their products:
CloudScaling
DDRdrive
datto
Delphix
GE Healthcare
Great Lakes SAN
Losytec
High-Availability
HybridCluster
Nexenta Systems
OSNEXUS
RackTop
Spectra Logic
Storiant
Syneto
WHEEL Systems
Zetavault

ZFS can detect and correct silent corruption when configured to do so. I have a NAS that has 24 TB of raw storage, 16 TB of useable storage, running under OmniOS. I have well over 10 million files on the NAS (it is used as a backup for 8 systems) - I haven't lost a file in 4 years and I don't expect to lose any.

Re:ZFS, Apple! (0)

Anonymous Coward | about a month ago | (#47236237)

You forget that there's this thing called patents. None of those no-names on the list is really worth suing for patent infringement, but Apple is.

Re:ZFS, Apple! (1)

Anonymous Coward | about a month ago | (#47236273)

NetApp sued Sun over ZFS saying ZFS infringed their patents. Sun (later Oracle) countersued. Both suits were settled without any money flowing either way.

ZFS is considered safe to use without threat of legal action.

Do you seriously think that dozens of companies would use it in their businesses if there was a risk of being sued out of existence by Oracle?

Re:ZFS, Apple! (1)

Calibax (151875) | about a month ago | (#47236301)

I would hesitate to call GE Healthcare a small company. I doubt that Lawrence Livermore National Labs would be considered small as it's part of the government. Joyent is the company that supports node.js.

Anyone can sue anybody about anything, but winning is different matter. ZFS is considered safe from a legal point of view.

Re:ZFS, Apple! (1)

sribe (304414) | about a month ago | (#47236275)

Sun open sourced ZFS under a permissive license.

And NetApp claims that Sun & Oracle violated their patents.

Re:ZFS, Apple! (0)

Anonymous Coward | about a month ago | (#47236317)

But NetApp's claim was settled without any money changing hands.

Re:ZFS, Apple! (1)

ColdWetDog (752185) | about a month ago | (#47236111)

I'm curious why it's been ignored or deprecated or whatever Apple did to it. They have the resources to throw at a project like that. Presumably there was some calculation somewhere along the line that didn't make sense. Not that Apple is much for telling us things like that, but it would be fun to know.

Re:ZFS, Apple! (0)

Anonymous Coward | about a month ago | (#47236145)

I think there were licensing issues (http://gizmodo.com/5389520/licensing-issues-at-heart-of-apples-decision-to-kill-snow-leopard-zfs-plans).

Re:ZFS, Apple! (1)

kthreadd (1558445) | about a month ago | (#47236181)

So how is FreeBSD able to license ZFS by simply importing it into the source tree and Apple is not?

Re:ZFS, Apple! (0)

Anonymous Coward | about a month ago | (#47236365)

Don Brady, the engineer who championed the use of ZFS at Apple, left the company. Nobody else was interested in heading up the project so it was cancelled. At the time there was also a threat of ZFS infringing on NetApp patents which also played into the decision. That's not an issue now.

Isn't Samsung the largest UNIX vendor? *grin* (1, Informative)

sirwired (27582) | about a month ago | (#47236127)

Due to their commanding smartphone marketshare, along with millions of devices with embedded Linux shipped every year, wouldn't Samsung be the largest UNIX vendor?

Oh? What's that? You weren't counting embedded Linux and I'm a pedantic #$(*#$&@!!!. Can't argue with that!

Re:Isn't Samsung the largest UNIX vendor? *grin* (1)

jo_ham (604554) | about a month ago | (#47236151)

Now there's a can of worms. I think the question "Is Linux really Unix?" is a guaranteed heat-generator.

Re:Isn't Samsung the largest UNIX vendor? *grin* (0)

Anonymous Coward | about a month ago | (#47236209)

Close enough

Re:Isn't Samsung the largest UNIX vendor? *grin* (1)

Antique Geekmeister (740220) | about a month ago | (#47236419)

If you follow the specifications, there's no need for heat. No Linux variant has been certified according to the POSIX standards for UNIX, and most variants have subtle ways in which they diverge from the POSIX standards, at least subtly. Wikipedia has a good note on this at http://en.wikipedia.org/wiki/S... [wikipedia.org]

Personally, I've found each UNIX to each have some rather strange distinctions from the other UNIX's, and using the GNU software base and the Linux based software packages to assure compatibility among the different UNIX variants.

                         

28 files in 6 years is a hardware defect (1)

Anonymous Coward | about a month ago | (#47236085)

Sure, a modern filesystem should be designed to catch and possibly work around bit errors, but in the end, hardware which causes that many bit errors is defective and needs to be fixed or replaced. RAM would be my first suspect if there aren't any error messages in SMART or disk related entries in system logs. If the RAM is defective, can you really blame the filesystem? What if the files got corrupted in RAM while you were working on them?

Re:28 files in 6 years is a hardware defect (1)

kthreadd (1558445) | about a month ago | (#47236093)

How could the RAM be responsible for damaging a file between the time it was written to disk and when it was read from disk?

Re:28 files in 6 years is a hardware defect (1)

Qzukk (229616) | about a month ago | (#47236175)

The RAM is responsible for damaging the file while it sits in a buffer waiting to be written to the disk in the first place.

Re:28 files in 6 years is a hardware defect (1)

washu_k (1628007) | about a month ago | (#47236177)

Bad RAM could have corrupted the file as it was being written to disk. The file is corrupted all along, but not the disk/filesystem's fault

Or the file could have been corrupted in RAM on read, and would actually be fine if read on a working machine.

Or the disk has been replaced in those 6 years and the file was corrupted during the copy because of bad RAM

There are lots of possibilities for the file to get corrupted that don't involve the disk or filesystem.

Re:28 files in 6 years is a hardware defect (1)

v1 (525388) | about a month ago | (#47236185)

I see bad RAM cause two problems. First, when you are copying a file or editing it, and it gets saved, if the data was corrupted while it was in memory, it can become damaged when writing it. It doesn't have to affect the part of the file you were working with. If you were adding to the end of a long text document, page 2 could get damaged when you hit Save.

Second problem, more common in my experience, is directory corruption due to bad ram. When a machine would come in with a trashed directory, we used to just fix it and return it. But sometimes they'd come back again in a similar state. I'd run a memory test and find/replace a bad stick before repairing it again. Later I just got in the habit of running a short ram test anytime there was unusual directory damage. I found it in about 1 in 10 of the cases I checked. Those checks were only run in cases of severe or unusual damage though. Directory damage takes out files wholesale, and can affect data that never entered the computer, and not due to any hardware failure in the storage.

For the record, I manage over 20tb of data here, and to date I've lost two files. One was a blonde moment with RM on a file that wasn't backed up. (I had NO idea that RM followed symlinks!) The other was a failed slice in a mirror that cost me a singe document. That's over a span of over 20 years. If you've lost over 20 files in the last 10 years, you're doing something (or more probably several somethings) wrong.

Re:28 files in 6 years is a hardware defect (1)

Antique Geekmeister (740220) | about a month ago | (#47236435)

"rm" doesn't follow symlinks. However, if you have a symlink that is a directory, and hit "tab" to complete the link's name, it will put a dangling "/" on the link name. _That_ is referencing the directory from effectively "inside" the actual target directory.

I've had several conversations with colleagues over why just hitting 'tab for completion' can be hazardous. This is one of the particular cases.

Re:28 files in 6 years is a hardware defect (0)

Anonymous Coward | about a month ago | (#47236193)

The bit errors are in the live set. He kept the roughly 100GB of pictures on his computer, where over the years they most likely got moved from disk to disk or partition to partition, possibly even from a previous computer to the current one (his photos date from as far as eight years back). He doesn't really say. The chance of at least 28 bit errors in a set of 100GB going undetected by the error correction inside a hard disk is nil. The disk would have thrown plenty of read errors instead of handing over faulty data. Hard disks specify uncorrectable bit error rates of about 1 per 10TB, and the error correction is designed to still detect almost all of these errors. That leaves the connection to the computer or the computer itself as the source of the bit rot. From experience I can say that RAM isn't nearly as reliable as people would hope it to be.

Detachment (1)

PopeRatzo (965947) | about a month ago | (#47236117)

The solution is to not become too attached to data. It's all ephemeral anyway, in the grand scheme of things.

Re:Detachment (1)

grasshoppa (657393) | about a month ago | (#47236149)

Well, in the "grand scheme of things", so are we.

Me? I get rather attached to the source file I've been working on for the past 6 months.

Re:Detachment (2)

NormalVisual (565491) | about a month ago | (#47236161)

Yeah, tell that to the IRS when you go to pull your records during an audit... ;-)

Re:Detachment (1)

Anonymous Coward | about a month ago | (#47236231)

Tell them you stored all of your records in e-mail messages. They'll understand the loss.

So far what I lost... (4, Interesting)

cpct0 (558171) | about a month ago | (#47236133)

Bitrot is not usually the issue for most files. Sometimes, but it's rare. What I lost is a mayhem repository of hardware and software and human failure. Thanks for backup, life :)

On Bitrot:

- MP3s and M4As I had that suddenly started to stutter and jump around. You play the music and it starts to skip. Luckily I have backups (read on for why I have multiple backups of everything :) ) so when I find them, I just revert to the backup.
- Images having bad sectors like everyone else. Once or twice here or there.

- A few CDs due to CD degradation. That includes one that I really wish I'd still have, as it was a backup of something I lost. However, the CD takes hours to read, and then eventually either balks up or not for the directory. I won't tell you about actually trying to copy the files, especially with normal timeouts in modern OSes or the hardware pieces or whatnot.

Not Bitrot:

- Two RAID Mirror hard drives, as they were both the same company, and purchased at the same time (same batch), in the same condition, they both balked at approximately the same time, not leaving me time to transfer data back.

- An internal hard drive, as I was making backups to CDs (at that time). For some kind of reason I still cannot explain, the software thought my hard drive was both the source and the destination !!!! Computer froze completely after a minute or two, then I tried rebooting to no avail, and my partition block was now containing a 700mb CD image, quarter full with my stuff. I still don't know how that's possible, but hey, it did. Since I was actualy making my first CD at the time and it was my first backup in a year, I lost countless good files, many I gave up upon (especially my 90's favorite music video sources ripped from the original betacam tapes in 4:2:2 by myself).

- A full bulk of HDs on Mac when I tried putting the journal to another internal SSD drive. I have dozens of HDDs, and I thought it'd go faster to use that nifty "journal on another drive" option. It did work well, although it was hell to initialize, as I had to create a partition for each HDD, then convert them to journaled partitions. Worked awesomely, very quick, very efficient. One day after weeks of usage, I had to hard close the computer and its HDD. When they remounted, they all remounted in the wrong order, somehow using the bad partition order. So imagine you have perfectly healthy HDDs but thinking they have to use another HDDs journal. Mayhem! Most drives thought they were other ones, so my music HDD became my photos HDD RAID, my system HDD thought it was the backup HDD, but just what was in the journal. It took me weeks sporting DiskWarrrior and Data Rescue in order to get 99% of my files back (I'm looking at you, DiskWarrior as a 32 bit app not supporting my 9TB photo drive) with a combinaison of the original drive files and the backup drive files. Took months to rebuild the Aperture database from that.

- All my pictures from when I met my wife to our first travels. I had them in a computer, I made a copy for sure. But I cannot find any of that anywhere. Nowhere to be found, no matter where I look. Since that time, many computers happened, so I don't know where it could've been sent. But I'm really sad to have lost these

- Did a paid photoshoot for an unique event. Took 4 32GB cards worth of priceless pictures. Once done with a card, I was sifting through the pictures with my camera and noticed it had issues reading the card. I removed it immediately. When at home, I put the card in my computer, it had all the troubles in the world reading it (but was able to do so), I was (barely) able to import its contents to Aperture (4-5 pictures didn't make the cut, a few dozens had glitches). It would then (dramatically, as it somehow have its last breath after relinquishing its precious data) not read or mount anywhere, not even being recognized as a card by the readers. Childs, use new cards regularly for your gigs :)

- A RAID array breaking, and the company nowhere to be found after all these years, and the discs not being able to be read elsewhere.

- Countless HDDs breaking in various ways (including a cat throwing a vase full of water onto an open laptop ... yeah ... love my cats sometimes), all without consequences as I have daily backups of everything I own, and monthly offsites.

Re:So far what I lost... (0)

Anonymous Coward | about a month ago | (#47236341)

A few CDs due to CD degradation. That includes one that I really wish I'd still have, as it was a backup of something I lost. However, the CD takes hours to read, and then eventually either balks up or not for the directory. I won't tell you about actually trying to copy the files, especially with normal timeouts in modern OSes or the hardware pieces or whatnot.

I've had some success with recovering very scratched CDs that wouldn't work using GNU Ddrescue, and I imagine that it'd work similarly on a CD with bitrot. If it worked, you'd end up with an ISO image of the CD, albeit with a likely chance of some corrupted areas, which you could then either mount or re-burn.

And the story is? (3, Insightful)

Immerman (2627577) | about a month ago | (#47236137)

Bitrot. It's a thing. It's been a thing since at least the very first tape drive - hell it was a thing with punch cards (when it might well have involved actual rot). While the mechanism changes, every single consumer-level data-storage system in the history of computing has suffered from it. It's a physical phenomena independent from file system, and impossible to defend against in software unless it transparently invokes the one and only defense: redundant data storage. Preferably in the form of multiple redundant backups.

So what is the point of this article?

Re:And the story is? (0)

Anonymous Coward | about a month ago | (#47236189)

Anecdotal statistics.

Clickbait generating shit (1)

Torp (199297) | about a month ago | (#47236179)

The real article would be titled "file systems with no data redundancy and no checksums are vulnerable to bitrot".
That covers about any file system with the lone exception of ZFS when ran on a raid, maybe btrfs? and i guess some mainframe stuff.

article is suspect, summary is worse (4, Informative)

sribe (304414) | about a month ago | (#47236183)

In a footnote he admits that the corruption was caused by hardware issues, not HFS+ bugs, and of course the summary ignores that completely.

So, for that, let me counter his anecdote with my own anecdote: I have an HFS+ volume with a collection of over 3,000,000 files on it. This collection started in 2004, approximately 50 people access thousands of files on it per day, and occasionally after upgrades or problems it gets a full byte-to-byte comparison to one of three warm standbys. No corruption found, ever.

Re:article is suspect, summary is worse (0)

Anonymous Coward | about a month ago | (#47236305)

Dude, this is slashdot post ponies. Any submission that bashes Apple is clickbait, and they'll post cuz they need the ad revenue or something.

O! I read that as DICK rot (0)

Anonymous Coward | about a month ago | (#47236207)

Now THAT is serious.
Now that IS serious.
Now that is SERIOUS.
NOW that is serious.

Clueless article (4, Informative)

alexhs (877055) | about a month ago | (#47236227)

People talking about "bit rot" usually have no clue, and this guy is no exception.

It's extremely unlikely that a file would become silently corrupted on disk. Block devices include per-block checksums, and you either have a read error (maybe he has) or the data read is the same as the data previously written. As far as I know, ZFS doesn't help to recover data from read errors. You would need RAID and / or backups.

Main memory is the weakest link. That's why my next computer will have ECC memory. So, when you copy the file (or otherwise defragment or modify the file, etc), you read a good copy, some bit flips in RAM, and you write back corrupted data. Your disk receives the corrupted data, happily computes a checksum, therefore ensuring you can read back your corrupted data faithfully. That's where ZFS helps. Using checksumming scripts is a good idea, and I do it myself. But I don't have auto-defrag on Linux, so I'm safer : when I detect a corrupted copy, I still have the original.

ext2 was introduced in 1993, and so was NTFS. ext4 is just ext2 updated (ext was a different beast). If anything, HFS+ is more modern, not that it makes a difference. All of them are updated. By the way, I noticed recently that Mac OS X resource forks sometimes contain a CRC32. I noticed it in a file coming from Mavericks.

HFS reliability (0)

Anonymous Coward | about a month ago | (#47236303)

Anyone who owned a Mac since the 80s remembers having to use Norton Disk Doctor and later DiskWarrior at least once per month to repair the filesystem. Entire folders could go randomly missing each time you booted up your Mac, and if you accidentally lost power to your hard drive, the use of one of those was mandatory.

Re:HFS reliability (3, Insightful)

sribe (304414) | about a month ago | (#47236385)

Anyone who owned a Mac since the 80s remembers having to use Norton Disk Doctor and later DiskWarrior at least once per month to repair the filesystem. Entire folders could go randomly missing each time you booted up your Mac, and if you accidentally lost power to your hard drive, the use of one of those was mandatory.

Oh yes. I remember those days well. Journaled HFS+ fixed that, and for about the last decade the only times I have encountered a corrupted file system on a Mac, that discovery was followed shortly by total failure of the hard disk.

So, what was your fucking point?

Re:HFS reliability (1)

Smurf (7981) | about a month ago | (#47236395)

Anyone who owned a Mac since the 80s remembers having to use Norton Disk Doctor and later DiskWarrior at least once per month to repair the filesystem. Entire folders could go randomly missing each time you booted up your Mac, and if you accidentally lost power to your hard drive, the use of one of those was mandatory.

No, not "anyone who owned a Mac since the 80s...". My first Mac was a Mac Plus bought in 1987 (IIRC), and I have never used those tools nor experienced the problems you mention.

Re: HFS reliability (0)

Anonymous Coward | about a month ago | (#47236397)

As someone who bought a mac 128k in May 1984 (educational discount $1000) and owned only Macs at home - I virtually never had to do disk repairs - probably on the order of one per decade. Certainly not close to even yearly.

Btrfs (1)

Flammon (4726) | about a month ago | (#47236331)

I've slowly been moving all my systems to Btrfs from least important to most important and have had no problems so far.

So answer me this... (1)

trparky (846769) | about a month ago | (#47236399)

Some people are talking about the fact that bitrot could happen as a result of bad RAM. Are you talking about bad system RAM or the RAM onboard the HDD's controller board?

If it was indeed bad system RAM, wouldn't bad system RAM cause a random BSOD (Windows) or Kernel Panic (Linux)? With how much RAM we use these days it's very likely we're going to be using all of the storage capacity of each of the DIMMs that we have in our systems.

Myself I have 16 GBs of RAM in my Windows machine and at any moment in time I'm using at the very least 40% of the RAM in the system with spikes up to at least 60% depending upon what I'm doing at the time. So with that said, the possibility of kernel memory structures being corrupted at some point while using memory (in even less used DIMMs in your system) I figure is going to happen. I'm not sure how the memory in the DIMMs are being used though. Is it being used sequentially? (DIMM 0, chip 1... 2... 3... 4, DIMM 1, chip 1... 2... 3...4, etc.) Or is the data thrown about randomly on the DIMMs?

Myself, if I had a random BSOD just happen I'd be running MemTest86+ in a hot second to test my system RAM and be asking to Corsair (the company that made my DIMMs) for an RMA.

So if does indeed turn out to be bad system RAM that causes this, I guess that it's a good idea not to be buying cheap RAM to begin with. Myself, I've never had a problem with Corsair Vengeance RAM modules so I will continue to buy that line of Corsair memory.
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...