One Developer's Experience With Real Life Bitrot Under HFS+

Become a fan of Slashdot on Facebook

One Developer's Experience With Real Life Bitrot Under HFS+ 396

Posted by timothy on Saturday June 14, 2014 @09:25AM from the so-really-it's-both-plus-and-minus dept.

New submitter jackjeff (955699) writes with an excerpt from developer Aymeric Barthe about data loss suffered under Apple's venerable HFS+ filesystem. HFS+ lost a total of 28 files over the course of 6 years. Most of the corrupted files are completely unreadable. The JPEGs typically decode partially, up to the point of failure. The raw .CR2 files usually turn out to be totally unreadable: either completely black or having a large color overlay on significant portions of the photo. Most of these shots are not so important, but a handful of them are. One of the CR2 files in particular, is a very good picture of my son when he was a baby. I printed and framed that photo, so I am glad that I did not lose the original. (Barthe acknowledges that data loss and corruption certainly aren't limited to HFS+; "bitrot is actually a problem shared by most popular filesystems. Including NTFS and ext4." I wish I'd lost only 28 files over the years.)

This discussion has been archived. No new comments can be posted.

One Developer's Experience With Real Life Bitrot Under HFS+

Load All Comments

Search 396 Comments Log In/Create an Account

Comments Filter:

I've also had this happen with HFS+ (Score:5, Informative)

by carlhaagen ( 1021273 ) writes: on Saturday June 14, 2014 @09:30AM (#47235971)

An old partition of some 20000 files, most of them 10 years or older, in where I found 7 or 8 files - coincidentally jpg images as well - that were corrupted. It struck me as nothing other than filesystem corruption as the drive was and still is working just fine.

Share
twitter facebook
- Re:I've also had this happen with HFS+ (Score:5, Insightful)
  
  by istartedi ( 132515 ) writes: on Saturday June 14, 2014 @09:57AM (#47236091) Journal
  
  coincidentally jpg images as well
  Well, JPGs are usually lossy and thus compressed. Flipping one bit in a compressed image file is likely to have severe consequences. OTOH, you could coXrupt a fewYentire byteZ in an uncompressed text file and it would still be readable. I suspect your drives also had a few "typos" that you didn't notice because of that.
  
  Parent Share
  twitter facebook
- Re:I've also had this happen with HFS+ (Score:4, Insightful)
  
  by Jane Q. Public ( 1010737 ) writes: on Saturday June 14, 2014 @07:14PM (#47238303)
  
  I agree with istaredi.
  
  Ultimately, it isn't a "failure" of HFS+ when your files get corrupted. It was (definitely) a hardware failure. It's just that HFS+ didn't catch the error when it happened.
  
  Granted, HFS+ is due for an update. That's something I've said myself many times. But blaming it when something goes wrong is like blaming your Honda Civic for smashing your head in when you roll it. It wasn't designed with a roll cage. You knew that but you bought it anyway, and decided to hotdog.
  
  Checksums also have performance and storage costs. So there are several different ways to look at it. One thing I strongly suggest is keeping records of your drive's S.M.A.R.T. status, and comparing them from time to time. And encourage Apple to update their FS, rather than blaming it for something it didn't cause, or for not doing something it wasn't designed to do.
  
  Parent Share
  twitter facebook
Backup? (Score:4, Insightful)

by graphius ( 907855 ) writes: on Saturday June 14, 2014 @09:36AM (#47235983) Homepage

shouldn't you have backups?

Share
twitter facebook
- Re:Backup? (Score:5, Insightful)
  
  by kthreadd ( 1558445 ) writes: on Saturday June 14, 2014 @09:42AM (#47236023)
  
  The problem with bit rot is that backups doesn't help. The corrupted file go into the backup and eventually replace the good copy depending on retention policy. You need a file system which uses checksums on all data block so that it can detect a corrupted block after reading it, flag the file as corrupted so that you can restore it from a good backup.
  
  Parent Share
  twitter facebook
  - Re:Backup? (Score:5, Insightful)
    
    by dgatwood ( 11270 ) writes: on Saturday June 14, 2014 @10:09AM (#47236139) Homepage Journal
    
    Depends on the backup methodology. If your backup works the way Apple's backups do, e.g. only modified files get pushed into a giant tree of hard links, then there's a good chance the corrupted data won't ever make it into a backup, because the modification wasn't explicit. Of course, the downside is that if the file never gets modified, you only have one copy of it, so if the backup gets corrupted, you have no backup.
    So yes, in an ideal world, the right answer is proper block checksumming. It's a shame that neither of the two main consumer operating systems currently supports automatic checksumming in the default filesystem.
    
    Parent Share
    twitter facebook
    - Re: (Score:2, Informative)
      
      by Antique Geekmeister ( 740220 ) writes:
      
      The bitrot will change the checksums and cause the files to show up as modified.
      Moreover, what will you do about a reported bitrotted file unless you have genuine archival backups somewhere else?
      - Re: (Score:3)
        
        by nine-times ( 778537 ) writes:
        
        If I remember correctly, that's not how Apple's current backup system works. Every time a file gets written to, there's a log someplace that records that the file was modified. Next time Time Machine runs, it backs up the files in that log. If the OS didn't actually modify the file, it won't get backed up.
        I may be wrong, but that's how I understood it.
        
        Re: (Score:2, Informative)
        
        by Anonymous Coward writes:
        
        Macosx Time Machine works by listening to filesystem events except for the first backup where everything is copied over as is. Bit rot doesn't get transferred until you overwrite the file, time by which it should have been obvious something was fishy or the bitrot was negligible and you didn't notice yourself. There are also situations where Time Machine itself says "this backup is fishy, regenerate from scratch?". Happened last week, but only after a failed drive had to be replaced which caused a 150GB bac
- Re: (Score:3)
  
  by ZosX ( 517789 ) writes:
  
  This is a good idea, but not a solution. Often you have no idea that the file is bad until after the fact, in this case years later. I've had mp3 collections get glitches here and there after a few copies from various drives. If you have no idea the data is bad in the first place, your backup of the data isn't going to be any better. I would say that all of my photography I've collected over the years has stayed readable somehow. I do check in lightroom every once in a while, but I wouldn't be shocked to fi
  - Re: (Score:2)
    
    by ColdWetDog ( 752185 ) writes:
    
    I have close to 4 terabytes of photography and video stored (not that kind of photography and video). I, too, have seen occasional unreadable files, typically in JPEGS but also an occasional TIFF file. Any compressed container (like a JPEG) is going to be more susceptible to this issue thus JPEGs aren't a great storage format. Video files are harder to figure - a corrupted bit could easily get overlooked.
    I've never actually lost a picture that I was interested in - I always have more than one copy of th
    - Re: (Score:2)
      
      by wagnerrp ( 1305589 ) writes:
      
      Video files are harder to figure - a corrupted bit could easily get overlooked.
      Again, it depends on whether it is compressed or not. A corrupted bit in video with only interframe compression will look just like a damaged JPEG. You may have an unreadable frame, or may have a corrupted macroblock or two in that frame. A corrupted bit in video with intraframe compression will smear that corrupted frame or macroblock for potentially several seconds until you hit the next I-frame to flush the image.
      You can spend a lot more money getting near perfect replication but I don't think many people are willing to have a system with ECC memory throughout the chain.
      The common solution to this issue is software, not hardware. You have your filesystem co
- Re: (Score:3)
  
  by rnturn ( 11092 ) writes:
  
  Even if you did have backups how could you even begin to know which saveset to restore from? You could have been backing up a corrupted file for a lo-o-ong time.
  Friends wonder why I still purchase physical books and CDs. This is why. I'll have to come up with a simple 2-3 sentence explanation of the problem the OP was describing for when they ask next time. I've had MP3 files made from my CD collection mysteriously become corrupted over time. No problem, I can just re-rip/convert/etc. but losing the origi
- Re: (Score:3)
  
  by gweihir ( 88907 ) writes:
  
  You should have:
  1. backups
  2. redundancy
  3. regular integrity checks of your data
  Or alternatively, you should have been using an archival grade medium, like archival tape or (historically now unfortunately) MOD.
  What the OP did is just plain incompetent and stupid and if he had spent 15 minutes to find out how to properly archive data, he would now not be in this fix. Instead he made assumptions without understanding or verification against the real world now blames others for his failure. Pathetic. Dunning-Kr
Comment removed (Score:5, Insightful)

by account_deleted ( 4530225 ) writes: on Saturday June 14, 2014 @09:40AM (#47236005)

Comment removed based on user account deletion

Share
twitter facebook
- Re: (Score:2)
  
  by jbolden ( 176878 ) writes:
  
  There is a 3rd possibility. As the size of the dataset increases you can construct a more complex error correcting code on that dataset with loss of spacing being 1/n. Note that's essentially saving information about the decoding and then the coded information, sort of like how compression works. Which for most files would be essentially free. And of course you could combine with this compression by default which might very well result in a net savings. But then you pick up computational complexity. Wi
  - reading checksum + n blocks is SLOW (Score:2)
    
    by raymorris ( 2726007 ) writes:
    
    It's not a matter of CPU load. Suppose you have one checksum block for every eight data blocks. In order to verify the checksum on read, you have to read the checksum block and all eight data blocks. So you have to read a total of nine blocks instead of one. Reading from the disk is one if the slowest operations in a computer, so ddoing it nine times instead of one slows things down considerably.
    - Re: (Score:3)
      
      by jbolden ( 176878 ) writes:
      
      You don't have checksum blocks in the space efficient method. Rather in the computational way I'm talking about it is a transformation. You might have something like every 6354 bits becomes 6311 bits after the complex transformation. It doesn't slow down the read but you have to do math.
- Re: Bitrot not the fault of filesystem (Score:5, Insightful)
  
  by jmitchel!jmitchel.co ( 254506 ) writes: on Saturday June 14, 2014 @09:52AM (#47236073)
  
  Even with just checksums, knowing that there is corruption means knowing to restore from backups. And in the consumer space most people have plenty of space to keep parity if it comes to that.
  
  Parent Share
  twitter facebook
- - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
    - Re: (Score:2)
      
      by fnj ( 64210 ) writes:
      
      And no, HDDs do not have built-in error correction, they have checksums -- those things are not the same thing.
      Sorry to inform you that your knowledge on this subject is not perfectly correct and inclusive. Hard drives use per-sector ECC. ECC stands for Error Correction Code. The very term tells you its function is to do precisely what you say is not done. Here is one tutorial [pcguide.com]. This stuff is pretty basic and widely known.
      "When a sector is written to the hard disk, the appropriate ECC codes are generated and
      - Re: (Score:2)
        
        by Electricity Likes Me ( 1098643 ) writes:
        
        Hard disks use ECC to allow the disk to reach the capacities it does. It is not designed for anything other then making the hard disk perform well. It doesn't protect you against hard disks which write incorrect data to start with, or have faulty cables etc.
- - - Re: (Score:2)
      
      by wagnerrp ( 1305589 ) writes:
      
      Isn't that RAID6?
Good backups aren't enough (Score:2)

by jmitchel!jmitchel.co ( 254506 ) writes:

Good backups aren't enough. If the filesystem isn't flagging corruption as it happens, the backup software will happily back up your corrupted data over and over until the last backup which has the valid file in it has expired or become unrecoverable itself.
ZFS, Apple! (Score:3)

by grub ( 11606 ) writes: <slashdot@grub.net> on Saturday June 14, 2014 @09:48AM (#47236047) Homepage Journal

This is why Apple should resurrect its ZFS project. Overnight they would be the largest ZFS vendor to match with being the largest UNIX vendor.

Share
twitter facebook
- Re: (Score:2)
  
  by ColdWetDog ( 752185 ) writes:
  
  I'm curious why it's been ignored or deprecated or whatever Apple did to it. They have the resources to throw at a project like that. Presumably there was some calculation somewhere along the line that didn't make sense. Not that Apple is much for telling us things like that, but it would be fun to know.
  - Re: (Score:3)
    
    by jbolden ( 176878 ) writes:
    
    Apple did announce why the project failed. ZFS on consumer grade hardware with consumer interactions was too dangerous. Things like pulling an external drive out during mid write could corrupt an entire ZFS volume. Apple simply couldn't get ZFS to work under the conditions their systems need it to. They had to backout completely and come up with a plan-B. The developer who worked on this left Apple and now produces a better ZFS for OSX. That company got bought by Oracle so Oracle owns it now.
  - - Re: (Score:2)
      
      by kthreadd ( 1558445 ) writes:
      
      So how is FreeBSD able to license ZFS by simply importing it into the source tree and Apple is not?
- Isn't Samsung the largest UNIX vendor? *grin* (Score:2, Informative)
  
  by sirwired ( 27582 ) writes:
  
  Due to their commanding smartphone marketshare, along with millions of devices with embedded Linux shipped every year, wouldn't Samsung be the largest UNIX vendor?
  Oh? What's that? You weren't counting embedded Linux and I'm a pedantic #$(*#$&@!!!. Can't argue with that!
  - Re: (Score:2)
    
    by jo_ham ( 604554 ) writes:
    
    Now there's a can of worms. I think the question "Is Linux really Unix?" is a guaranteed heat-generator.
    - Re: (Score:2)
      
      by Antique Geekmeister ( 740220 ) writes:
      
      If you follow the specifications, there's no need for heat. No Linux variant has been certified according to the POSIX standards for UNIX, and most variants have subtle ways in which they diverge from the POSIX standards, at least subtly. Wikipedia has a good note on this at http://en.wikipedia.org/wiki/S... [wikipedia.org]
      Personally, I've found each UNIX to each have some rather strange distinctions from the other UNIX's, and using the GNU software base and the Linux based software packages to assure compatibility among t
- - Re: (Score:3)
    
    by grahamsaa ( 1287732 ) writes:
    
    I'm not sure this is true. Other vendors like iXsystems already sell products that ship with ZFS. As I understand it, ZFS is BSD licensed. While Oracle distributes its own version of ZFS that may (or may not) include proprietary features, the open sourced version is freely distributable. The only reason it's packaged as a userland utility for Linux is that the BSD license isn't compatible with the kernel's GPL license. Apple's kernel is definitely not GPL, so this isn't a problem for them.
    
    One proble
    - Re: (Score:2)
      
      by kthreadd ( 1558445 ) writes:
      
      ZFS does not require ECC memory more than any other file system. I have no idea where you got that from.
      - Re: (Score:2)
        
        by grahamsaa ( 1287732 ) writes:
        
        Of course it doesn't, and I never said that. But your chances of data corruption if you use ZFS without ECC are somewhat greater, and potentially much more catastrophic. A web search for 'ZFS without ECC' will point you to a number of horror stores. Basically, ZFS always trusts what's in memory, so if what's in memory differs from what's on disk, the contents on disk get overwritten. If this discrepancy is due to bit rot, that's great -- you've just saved your data. But if it's due to a memory error, y
        
        Re: (Score:2)
        
        by kthreadd ( 1558445 ) writes:
        
        I see what you mean now, but I must say that I really don't agree with these non-ECC horror stories. You have much bigger problems if you have memory corruption.
        
        Re: (Score:2)
        
        by nabsltd ( 1313397 ) writes:
        
        You have much bigger problems if you have memory corruption.
        If you don't use ECC memory, you will have memory corruption. Even if you do use ECC memory, you might have corruption, and it might even go unnoticed, but the odds are far less likely.
        "Corruption" in this sense doesn't mean that whole DIMMs are broken...it just means that one bit has changed in a way that the user/OS/CPU didn't want it to. In many cases, this can be completely harmless (e.g., graphical data used in-memory only has one bit wrong...you might not even notice a color shift if it's the LSB),
        
        Re: (Score:2)
        
        by kthreadd ( 1558445 ) writes:
        
        Then it's much simpler. This ECC issue has absolutely nothing to do with ZFS. You should use ECC RAM if you are doing any form of disk IO no matter which file system you're using, or you are under the risk of data loss.
        
        Re: (Score:3)
        
        by Kaenneth ( 82978 ) writes:
        
        Back when I did tech support for a lightweight Mac database product, they didn't use Parity (much less ECC) RAM.
        I had a customer call in because students were continually getting corrupted databases on their assignments.
        over the course of several phone calls, we narrowed it down to only happening in 1 of 3 labs.
        After excluding anything high-energy (like a physics lab) in the building, I got the customer to reveal that they were constructing a new building next door, and the construction power tools were run
  - Re: (Score:3)
    
    by Calibax ( 151875 ) * writes:
    
    No they would not be sued by anyone.
    Sun open sourced ZFS under a permissive license. Oracle close sourced it again. However, a number of companies are supporting derivatives of the open source version.
    ZFS is available for a number of operating systems today. A non-inclusive list:
    FreeBSD from iXsystems
    Linux from Lawrence Livermore National Laboratory and also Pogo Linux
    SmartOS from Joyent
    OmniOS from Omniti
    Osv from CloudOS
    In addition a number of companies are using ZFS in their products:
    CloudScaling
    DDRdrive
    - Re: (Score:2)
      
      by sribe ( 304414 ) writes:
      
      Sun open sourced ZFS under a permissive license.
      And NetApp claims that Sun & Oracle violated their patents.
    - - Re: (Score:2)
        
        by Calibax ( 151875 ) * writes:
        
        I would hesitate to call GE Healthcare a small company. I doubt that Lawrence Livermore National Labs would be considered small as it's part of the government. Joyent is the company that supports node.js.
        Anyone can sue anybody about anything, but winning is different matter. ZFS is considered safe from a legal point of view.
  - Re: (Score:2)
    
    by fnj ( 64210 ) writes:
    
    They would also be sued pretty quickly by Oracle. Clearly not an option.
    Your conclusion is a bit hasty and unwarranted. I am not going to tell you that Oracle CANNOT sue anyone for any trumped-up reason, but ZFS is licensed under the Common Development and Distribution License (CDDL) and is open source [rtt-law.com]. For linux, there is an issue with how CDDL plays with GPL, so no distro has yet bundled ZFS with linux. Linux users, however, can themselves pick up "ZFS on Linux" [zfsonlinux.org] and install it themselves without violating
Detachment (Score:2)

by PopeRatzo ( 965947 ) writes:

The solution is to not become too attached to data. It's all ephemeral anyway, in the grand scheme of things.
- Re: (Score:2)
  
  by grasshoppa ( 657393 ) writes:
  
  Well, in the "grand scheme of things", so are we.
  Me? I get rather attached to the source file I've been working on for the past 6 months.
- Re: (Score:3)
  
  by NormalVisual ( 565491 ) writes:
  
  Yeah, tell that to the IRS when you go to pull your records during an audit... ;-)
So far what I lost... (Score:5, Interesting)

by cpct0 ( 558171 ) writes: <slashdot.micheldonais@com> on Saturday June 14, 2014 @10:08AM (#47236133) Homepage Journal

Bitrot is not usually the issue for most files. Sometimes, but it's rare. What I lost is a mayhem repository of hardware and software and human failure. Thanks for backup, life :)
On Bitrot:
- MP3s and M4As I had that suddenly started to stutter and jump around. You play the music and it starts to skip. Luckily I have backups (read on for why I have multiple backups of everything :) ) so when I find them, I just revert to the backup.
- Images having bad sectors like everyone else. Once or twice here or there.
- A few CDs due to CD degradation. That includes one that I really wish I'd still have, as it was a backup of something I lost. However, the CD takes hours to read, and then eventually either balks up or not for the directory. I won't tell you about actually trying to copy the files, especially with normal timeouts in modern OSes or the hardware pieces or whatnot.
Not Bitrot:
- Two RAID Mirror hard drives, as they were both the same company, and purchased at the same time (same batch), in the same condition, they both balked at approximately the same time, not leaving me time to transfer data back.
- An internal hard drive, as I was making backups to CDs (at that time). For some kind of reason I still cannot explain, the software thought my hard drive was both the source and the destination !!!! Computer froze completely after a minute or two, then I tried rebooting to no avail, and my partition block was now containing a 700mb CD image, quarter full with my stuff. I still don't know how that's possible, but hey, it did. Since I was actualy making my first CD at the time and it was my first backup in a year, I lost countless good files, many I gave up upon (especially my 90's favorite music video sources ripped from the original betacam tapes in 4:2:2 by myself).
- A full bulk of HDs on Mac when I tried putting the journal to another internal SSD drive. I have dozens of HDDs, and I thought it'd go faster to use that nifty "journal on another drive" option. It did work well, although it was hell to initialize, as I had to create a partition for each HDD, then convert them to journaled partitions. Worked awesomely, very quick, very efficient. One day after weeks of usage, I had to hard close the computer and its HDD. When they remounted, they all remounted in the wrong order, somehow using the bad partition order. So imagine you have perfectly healthy HDDs but thinking they have to use another HDDs journal. Mayhem! Most drives thought they were other ones, so my music HDD became my photos HDD RAID, my system HDD thought it was the backup HDD, but just what was in the journal. It took me weeks sporting DiskWarrrior and Data Rescue in order to get 99% of my files back (I'm looking at you, DiskWarrior as a 32 bit app not supporting my 9TB photo drive) with a combinaison of the original drive files and the backup drive files. Took months to rebuild the Aperture database from that.
- All my pictures from when I met my wife to our first travels. I had them in a computer, I made a copy for sure. But I cannot find any of that anywhere. Nowhere to be found, no matter where I look. Since that time, many computers happened, so I don't know where it could've been sent. But I'm really sad to have lost these
- Did a paid photoshoot for an unique event. Took 4 32GB cards worth of priceless pictures. Once done with a card, I was sifting through the pictures with my camera and noticed it had issues reading the card. I removed it immediately. When at home, I put the card in my computer, it had all the troubles in the world reading it (but was able to do so), I was (barely) able to import its contents to Aperture (4-5 pictures didn't make the cut, a few dozens had glitches). It would then (dramatically, as it somehow have its last breath after relinquishing its precious data) not read or mount anywhere, not even being recognized as a card by the readers. Childs, use new cards regularly for your gigs :)
- A RAID array b
Read the rest of this comment...

Share
twitter facebook
- - Re: (Score:2)
    
    by Electricity Likes Me ( 1098643 ) writes:
    
    Scratched CDs can be recovered by polishing them with Brasso.
    The trick is it polishes the scratches out flat so they don't mess with reflections of the reader.
  - Re: So far what I lost... (Score:2)
    
    by cpct0 ( 558171 ) writes:
    
    Yeah, with scratches it's a valid assumption. However, in my case it was cheap CDs with inks that degraded, so the reflectivity of the data itself was degraded, the drive was ultimately unable to retrieve data on most sectors, or it was able after dozen of reads over the same block of data, until the data got its green flag from the recovery algorithm embedded in Data-CD format specs.
    A scratch is localized. CD dye degradation is global. But thanks for the idea.
And the story is? (Score:4, Insightful)

by Immerman ( 2627577 ) writes: on Saturday June 14, 2014 @10:08AM (#47236137)

Bitrot. It's a thing. It's been a thing since at least the very first tape drive - hell it was a thing with punch cards (when it might well have involved actual rot). While the mechanism changes, every single consumer-level data-storage system in the history of computing has suffered from it. It's a physical phenomena independent from file system, and impossible to defend against in software unless it transparently invokes the one and only defense: redundant data storage. Preferably in the form of multiple redundant backups.
So what is the point of this article?

Share
twitter facebook
Clickbait generating shit (Score:2)

by Torp ( 199297 ) writes:

The real article would be titled "file systems with no data redundancy and no checksums are vulnerable to bitrot".
That covers about any file system with the lone exception of ZFS when ran on a raid, maybe btrfs? and i guess some mainframe stuff.
article is suspect, summary is worse (Score:5, Informative)

by sribe ( 304414 ) writes: on Saturday June 14, 2014 @10:15AM (#47236183)

In a footnote he admits that the corruption was caused by hardware issues, not HFS+ bugs, and of course the summary ignores that completely.
So, for that, let me counter his anecdote with my own anecdote: I have an HFS+ volume with a collection of over 3,000,000 files on it. This collection started in 2004, approximately 50 people access thousands of files on it per day, and occasionally after upgrades or problems it gets a full byte-to-byte comparison to one of three warm standbys. No corruption found, ever.

Share
twitter facebook
- Re: (Score:2)
  
  by pla ( 258480 ) writes:
  
  In a footnote he admits that the corruption was caused by hardware issues, not HFS+ bugs, and of course the summary ignores that completely.
  
  The summary doesn't claim HFS caused the bitrot, you read that into it. The summary merely points out that HFS doesn't reliably detect and correct flaws in the underlying storage media (as does NTFS, as does almost every filesystem widely used).
  
  More importantly, while merely detecting this issue may not incur too much overhead, correcting it requires some fairly l
  - Re: (Score:2)
    
    by sribe ( 304414 ) writes:
    
    The summary doesn't claim HFS caused the bitrot, you read that into it.
    The summary's first sentence ends: "about data loss suffered under Apple's venerable HFS+ filesystem" and shortly thereafter it continues with: "HFS+ lost a total of 28 files over the course of 6 years." So the chosen wording most certainly does imply that HFS is at fault. One has to click the link to the article, then read all the way through the frickin' footnotes before one encounters anything to explicitly disavow that implication.
Clueless article (Score:5, Informative)

by alexhs ( 877055 ) writes: on Saturday June 14, 2014 @10:27AM (#47236227) Homepage Journal

People talking about "bit rot" usually have no clue, and this guy is no exception.
It's extremely unlikely that a file would become silently corrupted on disk. Block devices include per-block checksums, and you either have a read error (maybe he has) or the data read is the same as the data previously written. As far as I know, ZFS doesn't help to recover data from read errors. You would need RAID and / or backups.
Main memory is the weakest link. That's why my next computer will have ECC memory. So, when you copy the file (or otherwise defragment or modify the file, etc), you read a good copy, some bit flips in RAM, and you write back corrupted data. Your disk receives the corrupted data, happily computes a checksum, therefore ensuring you can read back your corrupted data faithfully. That's where ZFS helps. Using checksumming scripts is a good idea, and I do it myself. But I don't have auto-defrag on Linux, so I'm safer : when I detect a corrupted copy, I still have the original.
ext2 was introduced in 1993, and so was NTFS. ext4 is just ext2 updated (ext was a different beast). If anything, HFS+ is more modern, not that it makes a difference. All of them are updated. By the way, I noticed recently that Mac OS X resource forks sometimes contain a CRC32. I noticed it in a file coming from Mavericks.

Share
twitter facebook
- Re: (Score:3)
  
  by rabtech ( 223758 ) writes:
  
  People talking about "bit rot" usually have no clue, and this guy is no exception.
  It's extremely unlikely that a file would become silently corrupted on disk. Block devices include per-block checksums, and you either have a read error (maybe he has) or the data read is the same as the data previously written. As far as I know, ZFS doesn't help to recover data from read errors. You would need RAID and / or backups.
  I'm afraid it is you who is clueless. Up until ZFS started gaining traction, we all had the luxury of assuming the storage chain was reliable (RAM, SATA controller, cables, drive firmware, read/write heads, oxide layers, etc). Or at least we would know something went wrong.
  But it was found that in the actual real world, these systems all silently corrupt data from time to time. The problem is much worse as the volume of data grows because the error rates are basically unchanged, meaning what was once expect
Btrfs (Score:2)

by Flammon ( 4726 ) writes:

I've slowly been moving all my systems to Btrfs from least important to most important and have had no problems so far.
- Re: (Score:2)
  
  by wisnoskij ( 1206448 ) writes:
  
  Btrfs "pronounced "Butterface"" - Wikipedia
  Lol.
  Strangely that acronym could also stand for BiT Rot Free System which is pretty ironic, I guess.
So answer me this... (Score:2)

by trparky ( 846769 ) writes:

Some people are talking about the fact that bitrot could happen as a result of bad RAM. Are you talking about bad system RAM or the RAM onboard the HDD's controller board?

If it was indeed bad system RAM, wouldn't bad system RAM cause a random BSOD (Windows) or Kernel Panic (Linux)? With how much RAM we use these days it's very likely we're going to be using all of the storage capacity of each of the DIMMs that we have in our systems.

Myself I have 16 GBs of RAM in my Windows machine and at any moment i
- It's all about ERROR rates (Score:3)
  
  by bussdriver ( 620565 ) writes:
  
  RAM may have a low error rate much better than HDDs or SDs. That does not mean that you won't have errors even if you have a good brand and treat it well. Bit-level errors can and do happen all the time without us knowing; other times it happens in the wrong place and we notice (but think it is something else) it isn't until it gets really bad that we notice.
  Example, say your RAM has a 1% bit loss rate (ignore that is insanely high) well if 90% of your data is not touchy code but data, the odds are that
  - Re: (Score:2)
    
    by trparky ( 846769 ) writes:
    
    I have noticed that a lot of OEMs (Dell, HP, Apple, etc.) use a no-name brand of RAM in many of their systems that they build. If you look at them, especially the CAS latency stats, you'll notice that many of the RAM chips found in most pre-made computers are absolutely pitiful (to say the least).
    
    So with that being said, who knows if this no-name RAM that is installed in many pre-made computers that many people buy is of any real quality. I'm guessing... no. So, with that said perhaps that odds of bitr
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    ECC is not what you need for reliable data archiving. What you need is independent checksums and you need to actually compare them to the data on disk. If you store an MD5 or SHA1 hash with all files, corruption from RAM, buses and the like will not go undetected. The way things go today though, most people do not even verify a backup. No surprise they lose data, incompetence and laziness comes at a price. Of course, you should make sure your RAM runs stable, but I have not had a single ECC corrected bit in
- Re: (Score:2)
  
  by fnj ( 64210 ) writes:
  
  If it was indeed bad system RAM, wouldn't bad system RAM cause a random BSOD (Windows) or Kernel Panic (Linux)?
  Likely so, but if we are talking about errors that only show up in 28 file-reads out of millions of file-reads, there is no reason to believe that you would be bound to see such a panic during the period in question.
  BTW, bad RAM anywhere in the chain from disk drive to CPU - main system RAM, CPU cache RAM, hard drive cache RAM, controller RAM, etc - could cause such a panic, since most data travels
- - Re: (Score:2)
    
    by trparky ( 846769 ) writes:
    
    If I were in your shoes, if that module failed a MemTest (even just one pass) then that module will be getting replaced with an RMA from the RAM manufacturer. I don't care if the system is stable, if that module failed... it's getting replaced.
how is this a file system problem? (Score:2)

by stenvar ( 2789879 ) writes:

This sounds like actual disk errors. File systems can't do much about them, you really need something like a RAID.
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  The OP did it wrong due to stupidity or laziness and now he is blaming others like an immature, petulant child would do.
- Re: (Score:3)
  
  by kthreadd ( 1558445 ) writes:
  
  The file system can do quite a bit if it actually does consistency checks on the data when reading it. ZFS does this and will alert you if the contents of a file has changed after it was last written, allowing you to restore a good copy from backup and verify that it is still valid.
Those that are incompetent will lose their data (Score:2)

by gweihir ( 88907 ) writes:

There are only two options for reliable data archiving: 1. Spinning disks with redundancy and regular checks 2. Archival grade tape. There used to be MOD as well, but as nobody cared enough to buy it, development stalled and then died. The OP simply was naive and stupid and did not bother to find out how to archive data properly. It is well-known how to do it and has been for a long time. I have not lost a single bit that I care about. Of course, I have a 3-way RAID1 with regular SMART and RAID consistency
- - Re: (Score:3)
    
    by FaxeTheCat ( 1394763 ) writes:
    
    Losing a day's or even an hour's data entry is not an option.
    If you have that kind of requirements (less than an hour lost data), then you are not looking for just backup/archive. You are looking for a fully redundant storage system.
    In addition to the backup system, of course.
    
    For reading, check up on backupentral.com, Symantec.com (Backup Exec/Netbackup) emc.com (Avamar, networker).
    I once managed a Filemaker database server (v5), and it has a built in featuer to copy the database files for backup. Real simple. Cannot remember if the database had to be taken offli
  - - Re: (Score:3)
      
      by WheezyJoe ( 1168567 ) writes:
      
      Simple: It is a "Datasheet" covering an "archival grade medium". If you do not know that, you have absolutely no business working on any kind of "mission critical" storage, as you are simply incompetent with regard to that subject.
      Easy, there, big fella. Posting a link to a datasheet would have sufficed. Ain't right to call a man incompetent for asking a question. Truly, an incompetent is one who don't never ask the question assuming he already knows. Credit is due for seeking to learn something.
- Re: (Score:2, Informative)
  
  by Anonymous Coward writes:
  
  The problem is, neither ZFS or Btrfs would have stopped an arbitrary bit inside an arbitrary file from becoming corrupt if the disk failed to write it or read it correctly. Only multiple disks and redundancy would have solved that.
  - Re:Legacy file systems should be illegal (Score:5, Insightful)
    
    by kthreadd ( 1558445 ) writes: on Saturday June 14, 2014 @09:43AM (#47236031)
    
    At least you would know that the file was corrupted, so that you could restore it from a good backup.
    
    Parent Share
    twitter facebook
    - - Re: (Score:2)
        
        by wagnerrp ( 1305589 ) writes:
        
        Actually, no, they won't. The chances of bitrot occurring in the same location on both your primary store and your backup, such that neither had viable data to recover, is astronomically low.
        
        Re: Legacy file systems should be illegal (Score:4, Informative)
        
        by O('_')O_Bush ( 1162487 ) writes: on Saturday June 14, 2014 @01:30PM (#47237023)
        
        Vs the chances you back up already corrupted files and don't notice until you've aged off the good versions.
        
        Parent Share
        twitter facebook
      - Re: Legacy file systems should be illegal (Score:5, Interesting)
        
        by aix tom ( 902140 ) writes: on Saturday June 14, 2014 @11:01AM (#47236375)
        
        A database is something special
        I basically make a "full backup" of my Oracle DBs once a week, and a "incremental backup" in the form of DB change logs every five minutes. (that is, the change logs are pushed "off site" every five minutes, of course they are being written locally continuously with every change.
        The thing with backups, though, is not only to make them often but to also *check* them often. With my DBs there is a handy tool where I can check the backup files for "flipped bits" because there are also checksums in the DB files.
        For my "private backups to DVD/BR" I only fill them up to ~70%, and fill the rest of the disk with checksum data with dvdisaster. [dvdisaster.net], for other "online backups" I create PAR2 files that I also store. With those parity files I can check "are all bits still OK?" now and then, and repair the damage when/if bits start to rot in the backup. In the 10 years I do this, with ~150 DVDs and ~20BRs so far I had 2 DVDs that became "glitchy", but because of the checksum data I was able to repair the ISO and re-burn them.
        Basically, IF you go through the trouble of setting up an automated backup system either with software or with your own scripts, It doesn't add much work to also add verification/checksum data to the backup. And that goes a long way into preventing data loss due to bit rot.
        
        Parent Share
        twitter facebook
  - Re:Legacy file systems should be illegal (Score:4, Interesting)
    
    by Chandon Seldon ( 43083 ) writes: on Saturday June 14, 2014 @09:46AM (#47236035) Homepage
    
    Btrfs (at least) can store multiple copies on one disk and use a checksum to identify the good copy to read. Obviously more disks is better, but...
    
    Parent Share
    twitter facebook
    - Re:Legacy file systems should be illegal (Score:4, Informative)
      
      by mgmartin ( 580921 ) writes: on Saturday June 14, 2014 @09:54AM (#47236083)
      
      As does zfs: man zfs
      copies=1 | 2 | 3 Controls the number of copies of data stored for this dataset. These copies are in addition to any redundancy provided by the pool, for example, mirroring or RAID-Z. The copies are stored on different disks, if possible. The space used by multiple copies is charged to the associated file and dataset, changing the used property and counting against quotas and reservations. Changing this property only affects newly-written data. Therefore, set this property at file system creation time by using the -o copies=N option.
      
      Parent Share
      twitter facebook
  - Re: (Score:3)
    
    by gweihir ( 88907 ) writes:
    
    Indeed. And only regular checking can detect it (ling SMART self-test and RAID consistency check every 7-14 days). The OP simply is naive and did not bother to find out how to properly archive data.
- Re:Legacy file systems should be illegal (Score:5, Insightful)
  
  by jbolden ( 176878 ) writes: on Saturday June 14, 2014 @09:47AM (#47236045) Homepage
  
  Yes absolutely great idea! Rather than having technical decisions being made at tech conferences and among developers, system administrators and analysts we should move that authority over to legislature. Because we all know we are going to see a far better weighing of the costs and benefits of various technology choices by the legislature than by technology marketplace.
  Apple used HFS+ because it worked to successfully migrate people from Mac OS9, it supported a unix / MacOS hybrid. They continue to use it because it has been good enough and many of the more robust filesystems were pretty heavyweight. I'd like something like BTFS too. But I don't think the people who disagree with me should be jailed.
  
  Parent Share
  twitter facebook
  - - Re: (Score:2)
      
      by kthreadd ( 1558445 ) writes:
      
      HFS+ was just an extension to HFS, which goes back to the System 2 days. HFS suffered from a number of limitations which made in unsuitable on volumes larger than 2 GB.
      - Re: (Score:3)
        
        by NJRoadfan ( 1254248 ) writes:
        
        Think of HFS+ as the equivalent of FAT32 for Macs. Its basically the old file system with support for larger drives and files. Apple latter tacked on journaling in OS X 10.3. I'm surprised Apple didn't push for a replacement file system after the switchover to Intel CPUs.
        
        Re: Legacy file systems should be illegal (Score:4, Informative)
        
        by maccodemonkey ( 1438585 ) writes: on Saturday June 14, 2014 @03:57PM (#47237569)
        
        They did try and replace the file system around the time of the Intel switch. Got killed by licensing problems.
        http://appleinsider.com/articl... [appleinsider.com]
        
        Parent Share
        twitter facebook
        
        Re: (Score:3)
        
        by Guy Harris ( 3803 ) writes:
        
        10.5 added hardlinking.
        Are you certain? The ln command, when run without -s, would return an error if you used it under Tiger or earlier?
        Or are you referring to hardlinking to directories, which was something UNIX traditionally supported, but which required root permissions (as it was used by the mkdir command to create the . and .. directories), and which was removed at one point (4.2BSD, as that added the mkdir() system call, making the ability of link() to link to a directory unnecessary?), and added back in 10.5 with the in
- Re:Legacy file systems should be illegal (Score:5, Interesting)
  
  by peragrin ( 659227 ) writes: on Saturday June 14, 2014 @10:04AM (#47236113)
  
  This is something everyone forgets.
  It takes decades to build long term reliable file systems.
  ZFS, BTFS, are less than a decade old.
  Windows runs on NTFS Version something. NTFS was started in what year?
  HFS, and then HFS + was built in what year?
  How long has Microsoft been promising WinFS?
  File systems change but only slowly. This is good. you need a good long track record to convince people they won't lose files every ten years due to random malfunctions.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by nine-times ( 778537 ) writes:
    
    How long has Microsoft been promising WinFS?
    I thought Microsoft gave up on WinFS. Are they still promising it?
  - - Re: (Score:2)
      
      by the_B0fh ( 208483 ) writes:
      
      Bullshit. I was running up through 10.4 or 10.5 on a PowerMac G4 450mhz. It was more responsive (I didn't say faster) and usable than the dual 1.4Ghz Pentium 3 I had.
- Re: (Score:3)
  
  by gweihir ( 88907 ) writes:
  
  Bullshit. Anybody doing competent archiving will either use professional archive-grade tape or spinning disks in RAID that gets checked frequently and with a second copy on a geographically independent location. I do that and my loss statistics for the last 14 years is exactly zero. I do have to replace a disk about once every 2 years in the 3-way RAID1 I use as primary archive though. This RAID runs with full SMART self-test every 7 days and RAID consistency check (full data compare) every 14 days. Expecti
- - - Re: Legacy file systems should be illegal (Score:4, Informative)
      
      by the_B0fh ( 208483 ) writes: on Saturday June 14, 2014 @10:58AM (#47236357) Homepage
      
      Not if your OS is tied intimately to your filesystem. Linux might not, because a large number of things are abstracted out, but FreeBSD depends on its file system, Solaris took a very long time/effort before it could boot off ZFS. Forget about moving Windows off NTFS. Apple actually did some work on putting it onto ZFS, maybe they will continue.
      
      Parent Share
      twitter facebook
      - Re: (Score:3)
        
        by jbolden ( 176878 ) writes:
        
        The developer involved left Apple, went off to found his own company. Completed the work and then got acquired by Oracle. http://getgreenbytes.com/solut... [getgreenbytes.com]
        So it is in some sense done. The question is whether Apple is going to buy an Oracle product or Oracle will sell or ...
- Re: (Score:2)
  
  by kthreadd ( 1558445 ) writes:
  
  The point is that there are good file systems that can detect when the storage unit fails, give you an alert and allow you to restore the file from a good backup. Without this feature the corrupted file will just get backed up like any other file and eventually replace the good backup.
- - Re: (Score:2)
    
    by wagnerrp ( 1305589 ) writes:
    
    Sufficiently advanced RAID implementations will carry checksums of those blocks for exactly that purpose.
  - Re: (Score:2)
    
    by the_B0fh ( 208483 ) writes:
    
    ZFS raid does that.
- Re: (Score:2)
  
  by kthreadd ( 1558445 ) writes:
  
  How could the RAM be responsible for damaging a file between the time it was written to disk and when it was read from disk?
  - Re: (Score:2)
    
    by Qzukk ( 229616 ) writes:
    
    The RAM is responsible for damaging the file while it sits in a buffer waiting to be written to the disk in the first place.
  - Re: (Score:2)
    
    by washu_k ( 1628007 ) writes:
    
    Bad RAM could have corrupted the file as it was being written to disk. The file is corrupted all along, but not the disk/filesystem's fault
    
    Or the file could have been corrupted in RAM on read, and would actually be fine if read on a working machine.
    
    Or the disk has been replaced in those 6 years and the file was corrupted during the copy because of bad RAM
    
    There are lots of possibilities for the file to get corrupted that don't involve the disk or filesystem.
  - Re: (Score:2)
    
    by v1 ( 525388 ) writes:
    
    I see bad RAM cause two problems. First, when you are copying a file or editing it, and it gets saved, if the data was corrupted while it was in memory, it can become damaged when writing it. It doesn't have to affect the part of the file you were working with. If you were adding to the end of a long text document, page 2 could get damaged when you hit Save.
    Second problem, more common in my experience, is directory corruption due to bad ram. When a machine would come in with a trashed directory, we used
    - Re: (Score:2)
      
      by Antique Geekmeister ( 740220 ) writes:
      
      "rm" doesn't follow symlinks. However, if you have a symlink that is a directory, and hit "tab" to complete the link's name, it will put a dangling "/" on the link name. _That_ is referencing the directory from effectively "inside" the actual target directory.
      I've had several conversations with colleagues over why just hitting 'tab for completion' can be hazardous. This is one of the particular cases.
      - Re: (Score:2)
        
        by kthreadd ( 1558445 ) writes:
        
        That depends on your shell. Bash works that way, but zsh does not; at least not by default as far as I know.
- Re:HFS reliability (Score:4, Insightful)
  
  by sribe ( 304414 ) writes: on Saturday June 14, 2014 @11:05AM (#47236385)
  
  Anyone who owned a Mac since the 80s remembers having to use Norton Disk Doctor and later DiskWarrior at least once per month to repair the filesystem. Entire folders could go randomly missing each time you booted up your Mac, and if you accidentally lost power to your hard drive, the use of one of those was mandatory.
  Oh yes. I remember those days well. Journaled HFS+ fixed that, and for about the last decade the only times I have encountered a corrupted file system on a Mac, that discovery was followed shortly by total failure of the hard disk.
  So, what was your fucking point?
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by Smurf ( 7981 ) writes:
  
  Anyone who owned a Mac since the 80s remembers having to use Norton Disk Doctor and later DiskWarrior at least once per month to repair the filesystem. Entire folders could go randomly missing each time you booted up your Mac, and if you accidentally lost power to your hard drive, the use of one of those was mandatory.
  No, not "anyone who owned a Mac since the 80s...". My first Mac was a Mac Plus bought in 1987 (IIRC), and I have never used those tools nor experienced the problems you mention.
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Yes, he did it to himself by making assumptions he liked and zero verification whether they hold up in the real world. Now he blames others for his stupidity.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

I've also had this happen with HFS+ (Score:5, Informative)

Re:I've also had this happen with HFS+ (Score:5, Insightful)

Re:I've also had this happen with HFS+ (Score:4, Insightful)

Backup? (Score:4, Insightful)

Re:Backup? (Score:5, Insightful)

Re:Backup? (Score:5, Insightful)

Re: (Score:2, Informative)

Re: (Score:3)

Re: (Score:2, Informative)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:3)

Comment removed (Score:5, Insightful)

Re: (Score:2)

reading checksum + n blocks is SLOW (Score:2)

Re: (Score:3)

Re: Bitrot not the fault of filesystem (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Good backups aren't enough (Score:2)

ZFS, Apple! (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Isn't Samsung the largest UNIX vendor? *grin* (Score:2, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Detachment (Score:2)

Re: (Score:2)

Re: (Score:3)

So far what I lost... (Score:5, Interesting)

Re: (Score:2)

Re: So far what I lost... (Score:2)

And the story is? (Score:4, Insightful)

Clickbait generating shit (Score:2)

article is suspect, summary is worse (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Clueless article (Score:5, Informative)

Re: (Score:3)

Btrfs (Score:2)

Re: (Score:2)

So answer me this... (Score:2)

It's all about ERROR rates (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

how is this a file system problem? (Score:2)

Re: (Score:2)

Re: (Score:3)

Those that are incompetent will lose their data (Score:2)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2, Informative)

Re:Legacy file systems should be illegal (Score:5, Insightful)

Re: (Score:2)

Re: Legacy file systems should be illegal (Score:4, Informative)

Re: Legacy file systems should be illegal (Score:5, Interesting)

Re:Legacy file systems should be illegal (Score:4, Interesting)

Re:Legacy file systems should be illegal (Score:4, Informative)

Re: (Score:3)

Re:Legacy file systems should be illegal (Score:5, Insightful)

Re: (Score:2)

Re: (Score:3)

Isn't Samsung the largest UNIX vendor? grin (Score:2, Informative)