×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Ask Slashdot: What's a Good Tool To Detect Corrupted Files?

Soulskill posted about 2 years ago | from the easy,-just-d%wn!o@%-Km3#-r*(;. dept.

Data Storage 247

Volanin writes "Currently I use a triple boot system on my Macbook, including MacOS Lion, Windows 7, and Ubuntu Precise (on which I spend the great majority of my time). To share files between these systems, I have created a huge HFS+ home partition (the MacOS native format, which can also be read in Linux, and in Windows with Paragon HFS). But last week, while working on Ubuntu, my battery ran out and the computer suddenly powered off. When I powered it on again, the filesystem integrity was OK (after a scandisk by MacOS), but a lot of my files' contents were silently corrupted (and my last backup was from August...). Mostly, these files are JPGs, MP3s, and MPG/MOV videos, with a few PDFs scattered around. I want to get rid of the corrupted files, since they waste space uselessly, but the only way I have to check for corruption is opening them up one by one. Is there a good set of tools to verify the integrity by filetype, so I can detect (and delete) my bad files?"

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

247 comments

Gamemaker sucks ass (-1)

Anonymous Coward | about 2 years ago | (#39918429)

Take that, you stupid gamemaker spammers!

Re:Gamemaker sucks ass (2, Funny)

binarylarry (1338699) | about 2 years ago | (#39918595)

Have some respect, the man just lost his entire porn stash.

Re:Gamemaker sucks ass (0)

jkflying (2190798) | about 2 years ago | (#39918791)

Including all of his erotica e-books. Tough life, dude.

Re:Gamemaker sucks ass (5, Funny)

Volanin (935080) | about 2 years ago | (#39918823)

Author here:

Ok, I could deal with the loss of some unique videos and pictures from travels... but now that you mention the porn... *weep*

Re:Gamemaker sucks ass (-1)

Anonymous Coward | about 2 years ago | (#39918909)

You little Gamemakerless supremacy! Why do you deny the truth!? Gamemaker is the next level of programming! True Programmers use Gamemaker. You can't be a True Programmer without Gamemaker.

Gamemaker is turing complete, abstract, will boost your PC & internet speed, and is a joy to program in!

Re:Gamemaker sucks ass (0)

Anonymous Coward | about 2 years ago | (#39919013)

I'm not using Gamemaker until it can also clean the kitty litter, go to the grocery store for me and find porn automatically for me!

Your eyes (-1, Redundant)

courteaudotbiz (1191083) | about 2 years ago | (#39918433)

And open the file. If corrupted, it will not look as it should. ;-)

Re:Your eyes (0)

dmacleod808 (729707) | about 2 years ago | (#39918733)

TL;DR Summary. in a quest to be #, Mr. Courteaudotbiz forgoes reading the summary to post Snarky Comment #1(TM)

Re:Your eyes (2)

cpu6502 (1960974) | about 2 years ago | (#39918893)

Perhaps but I agree with the first post. Going through and simply looking at all the JPEGs or MPEGs is probably the only way to tell if a file is corrupted (I wouldn't trust the CPU to do an accurate job). Also gives you a change to erase a lot of stuff you really don't need anymore. I dumped 300 gig off my drive simply by going through everything... took awhile but it was worthwhile to get rid of old shows/movies I'll likely never watch.

easy (-1)

Anonymous Coward | about 2 years ago | (#39918443)

Both Gamemaker and using a HOSTS file will do the job.

compare them to an intact backup (0)

allo (1728082) | about 2 years ago | (#39918467)

then you see which are changed. then check, if the file is much smaller or corrupted.

Re:compare them to an intact backup (2)

Pokermike (896718) | about 2 years ago | (#39918609)

And even though your last backup is from August, this will still constrain the number of files you potentially have to eyeball.

Re:compare them to an intact backup (2, Insightful)

Anonymous Coward | about 2 years ago | (#39918727)

Consider the possibility that the backup already contains corrupted files. I once had defective RAM where only one bit flipped occasionally. The machine was quite stable, so the defect went undetected and over a couple of months it silently corrupted hundreds of files. Unless he finds out what caused the crash, he can't be sure that the backup is alright.

Re:compare them to an intact backup (5, Insightful)

Calos (2281322) | about 2 years ago | (#39918761)

Well...

My first suspicion would be that the filesystem is messed up, not the actual files. Unless s/he had a lot of pending writes to all of these files, there is no reason that something should have actually overwritten or garbled them when the power shut down. Much more likely was an impending or in-progress write to the filesystem's tables, which has affected where it thinks all the files' pieces are stored. And if that is the case, date modified and size may be irrelevant because those are going to be reported by the filesystem.

Aside from trying to read back sector-by-sector data and assembling them, however, I don't know that there's a remedy.

Newbie question hour? (-1, Troll)

Anonymous Coward | about 2 years ago | (#39918473)

> Last backup August
> Thinks there is a way to detect generic file corruption

lol

Re:Newbie question hour? (5, Informative)

Volanin (935080) | about 2 years ago | (#39918661)

Author here:

> Last backup August.
Yes, that was silly of me.

> Thinks there is a way to detect generic file corruption
There is no way to detect generic file corruption. But there is a way to detect specific filetype corruption. For example, I already found mp3val, that is able to scan all my mp3 and check for file integrity, and even fix a few kinds of corruption (such as unmatching bytes in the header and sound chunks). Maybe with the right set of tools, I might also detect (or even fix) my corrupted pictures, movies and books as well.

Re:Newbie question hour? (2, Insightful)

Anonymous Coward | about 2 years ago | (#39918959)

Let me ask a stupid question since I've never run a battery out on a machine running Ubuntu. Why did this happen? Running OSX or Windows, the machine would have hibernated safely before the battery ran out. Does Ubuntu not do this and it just dies? Or is this something you configured to act this way? If it is default behavior in Ubuntu it is something they ought to fix.

Re:Newbie question hour? (1)

frisket (149522) | about 2 years ago | (#39919161)

Ubuntu pops up a warning window, and if you ignore it the battery light turns orange, and then red, and then it should hibernate. Flat-out dying is not something I've come across under Ubuntu (and I have some flaky old machines with old batteries, and they still warn me and then shut down).

Re:Newbie question hour? (4, Interesting)

loftwyr (36717) | about 2 years ago | (#39919021)

mplayer can detect corrupted movie and audio files find . -name '*.mov' -exec mplayer -msglevel all=6 -speed 100.0 -framedrop -nogui -nolirc -cache 8192 -tskeepbroken -ao null -vo null {} \; | grep Warning! > $1.txt Change the *.mov as appropriate.

Re:Newbie question hour? (1)

StikyPad (445176) | about 2 years ago | (#39919119)

Look, you're really taking the wrong approach here. The way to deal with corruption is avoidance, backup, and corrective action.

1) Avoidance. This is the generally the role of the filesystem and the underlying hardware, each of which have methods for preventing and correcting data corruption without ever involving the user. The user has a small part to play by doing things like shutting down instead of turning off whenever possible, though journaling filesystems (i.e., all modern filesystems) will know when a file operation was interrupted prematurely and check the integrity automatically. Also try not to put different file systems and OSes on the same drive, since there's the possibility that one OS may not respect the FS or limits of another (typically/historically, Windows has been the culprit here, but not always, and not so much anymore.) Any OS will generally leave an unrecognized drive alone unless you tell it to do otherwise, but the system drive has often been considered fair game.

2) Backups (optional). Once you have a known-good (or believed good) installation, create your backup. Repeat somewhere between often (if your data is important) and never (if it's not).

3) Correction. If and when you come across data corruption, that's not a sign that you're wasting space on your hard drive; it's a sign that something is seriously wrong. The proper course of action is to identify the underlying cause and correct it, not to delete the files to free up space. If you're experiencing corruption on only one drive regardless of channel and cable, replace the drive. If you're consistently having problems on a given channel, then don't use that channel. If you're having random issues across all drives on all channels, then the chipset is bad and the motherboard should be replaced. Basic troubleshooting.

Technically you *could* take a checksum of all of your files and update the database every time a file is changed. Some antivirus systems already do this to detect infections, but it would also detect incidental changes as well. The problem is that constantly verifying the integrity of your files will only hasten the demise of your storage medium. It's a self-fulfilling prophecy.

Re:Newbie question hour? (0)

Anonymous Coward | about 2 years ago | (#39919187)

There is an oldie out there called cleanjpg.exe which does the same for jpg's.

AppleScript (3, Interesting)

noh8rz3 (2593935) | about 2 years ago | (#39918477)

An AppleScript / Automator script can step through files on a hd, open them, and catch a thrown error if the open fails. Tis sits a good automated way to glad the bad ones. Not the fastest method, but it could run at night.

you seem to be surprisingly ok with the fact that your computer crashed and all your documents and media were corrupted, as was your backup. I would have been beside myself. Hulk smash! Please let us know what different set ups you're exploring to avoid this.

Re:AppleScript (0)

Anonymous Coward | about 2 years ago | (#39918541)

Tis sits a good automated way to glad the bad ones.

What? Are you sure your computer isn't crapping out on you too?

Re:AppleScript (3, Insightful)

dgatwood (11270) | about 2 years ago | (#39918613)

But the open usually won't fail. Unless the error is within the header bytes of a movie or image, the media will open, but will appear wrong. Worse, there is no way to detect this corruption because media file formats generally do not contain any sort of checksums. At best, you could write a script that looks for truncation (not enough bytes to complete a full macroblock), or write a tool that computes the difference between adjacent pixels across macroblock boundaries and flags any pictures in which there is an obvious high energy transition at the macroblock boundary, but even that cannot tell you whether the image is corrupt or simply compressed at a low quality setting with lots of blocking artifacts.

The short answer, however, is "no". Such corruption can't usually be detected programmatically.

Re:AppleScript (1)

dgatwood (11270) | about 2 years ago | (#39918683)

I should clarify. If you are intimately familiar with the format, and if it is a multi-frame format, such as a compressed audio or video format, it is possible to programmatically detect that there are frames that reference illegal frames, frames whose structure is not valid, etc. in much the same way that you can detect a JPEG file whose header is invalid.

Again, though, none of this will be caught by merely opening the movie; the movie will generally play correctly up until the decoder encounters the error, at which point it may recover and continue playing content after the gap, or it may just choke and die. Either way, detection isn't something that can be easily automated.

Re:AppleScript (2)

vlm (69642) | about 2 years ago | (#39918873)

The TLDR version is this scenario is why you configure your mythtv box to store MPEG TS which have embedded CRC error detection and recovery instead of MPEG PS which are irrelevantly smaller, if you have the option.

Re:AppleScript (1)

K. S. Kyosuke (729550) | about 2 years ago | (#39918939)

Doesn't MPlayer report most file corruptions to stdout or stderr even if the playback continues? You should be able to grep for it. Granted, it isn't bulletproof, but I often get warnings even if the playback seems fine - is seems to be sensitive. I don't think it would ignore jumbled sectors.

file(1) (1)

Anonymous Coward | about 2 years ago | (#39918481)

If the entire contents of the files are messed up, you could write a quick script that compares the output of file(1) to the file extension. I wouldn't call this high-fidelity - I'd recommend using this to generate a list you go through by hand - but it's at least a starting place.

Re:file(1) (3, Informative)

Volanin (935080) | about 2 years ago | (#39918785)

Author here:

At first I thought this idea wouldn't work. As some people have already written here, the 'file' command sometimes just checks for a few bytes. But since it is so easy to implement, why not give it a try? And indeed, for videos it worked quite well. Some of the corrupted MOV files were detected simply as 'data file' or even 'MPEG sequence' and were promptly deleted! Thank you for the idea.

BSOD? (1)

G3ckoG33k (647276) | about 2 years ago | (#39918487)

"What's a Good Tool To Detect Corrupted Files?"

BSOD?

Re:BSOD? No, use open source "Tripwire" (3, Informative)

quarkscat (697644) | about 2 years ago | (#39918767)

Not the BSOD.
If the OP had used open source "tripwire" on known-good files in each filesystem on his Macbook, and saved the resultant data output to a USB thumbdrive formatted with FAT32, the OP would have had a good chance of determining all corrupted files. In this case, an ounce of prevention would have prevented several pounds of "cure".

Check out http://tripwire.org./ [tripwire.org.]

Linux Command: file (1, Insightful)

Anonymous Coward | about 2 years ago | (#39918497)

Try running "file" from a command line on a few files you know to be corrupt. If the file command tells you the same, you could run a quick bash script to loop through the files and spit out the names of the bad ones. This is all assuming you know what you are doing with shell scripting.

CRC32 or any other quick hash of the files (0)

Anonymous Coward | about 2 years ago | (#39918509)

Compare files hash to known hash of good file.

This might help (0)

Anonymous Coward | about 2 years ago | (#39918519)

I'm not entirely sure if this will apply to your setup because corruption is kinda sporadic but....

Way back in my IRC days I ran an Fserv, DCC transfers between different versions of MIRC was a nightmare and quite often ended up corrupting files.

What I did as a half assed method, was write a simply batch script (this was a long time ago) that scanned my folders and created a text file listing of each file, extension, and most importantly, file size. I'd try to run that script daily, but you could easily automate it to run with scheduling.

Then I wrote another simple script to basically parse the current folder contents with the latest list I created, any differences in file sizes would be spit out to another text file. This would be basic command line stuff in linux, but again, this only catches the files that have changed size since the scan, it's not a fool proof corruption method.

Re:This might help (1)

Anonymous Coward | about 2 years ago | (#39918535)

That won't help detect corruption, only truncation of files. You would need an md5 or similar hash.

Re:This might help (1)

vlm (69642) | about 2 years ago | (#39918735)

That won't help detect corruption, only truncation of files. You would need an md5 or similar hash.

md5 is (relatively) slow. a simple CRC-32 will only fail you for 1 in 2 ** 32 corruptions, and I suspect the guy doesn't even have 2 ** 16 files so the odds are CRC-32 is more than good enough and significantly faster.

Then again, he's probably going to be hard drive speed limited not CPU limited. Then again, no point wasting laptop battery on an overly complicated algorithm. CRC32 is gonna use at least 1/5th the CPU/wallclock time and/or battery of md5.

The tradeoff boils down to you can use md5 and burn at least 5 times more battery/heat/wall clock time (whatever is your limiting reagent) in exchange for (128-32) = 2 ** 96 times lower likelihood of mistake. The problem with accepting 2 ** 96 higher reliability is his dying hard drive probably cannot provide 2 ** 16 reliability so increasing the algorithm is a waste since it's already asymptotically limited.

Re:This might help (2)

flargleblarg (685368) | about 2 years ago | (#39918937)

md5 is (relatively) slow. a simple CRC-32 will only fail you for 1 in 2 ** 32 corruptions, and I suspect the guy doesn't even have 2 ** 16 files so the odds are CRC-32 is more than good enough and significantly faster.

Not true. At all. On modern systems, MD5 is just as fast as CRC-32 because the disk is the bottleneck, not the CPU.

The BEST method.. (5, Funny)

Anonymous Coward | about 2 years ago | (#39918527)

is urgency. Corrupted files have the ability to detect urgency and your discovery of them will come in a form compatible with the laws of Murphy.

WHAT GOOD TOOL??? (0)

Anonymous Coward | about 2 years ago | (#39918555)

What ever. Look at /. advertisers. Why are you pestering us with mundane marketing questions?

I know - /. is a marketing company now ... gathering marketing data.

My Bad.

No easy answer (1, Insightful)

gstrickler (920733) | about 2 years ago | (#39918557)

1. Compare to backup, files that match are ok.
2. AppleScript option others mentioned may help reduce it further.
3. Backup regularly, and verify your backup procedure.
4. Anything else will cost you consulting rates.

Re:No easy answer (0)

Anonymous Coward | about 2 years ago | (#39918711)

Yes there is. I used a 15lb. sledgehammer on my eslate that I suspected had corrupt files and I was right. They were all corrupted.

Fake (0)

Anonymous Coward | about 2 years ago | (#39918967)

Yeah right, like a slashdot member could even lift a 15 lb. sledgehammer!

Re:No easy answer (0)

interval1066 (668936) | about 2 years ago | (#39919191)

Perl or bash will do this quite easily, run a hash compare of the two files, if they don't match delete the bad flle. Is this a serious question?

For MP3s use amp3test.exe (5, Informative)

denis-The-menace (471988) | about 2 years ago | (#39918567)

2000-2001 MAF-Soft http://www.maf-soft.de/ [maf-soft.de]
The version I have is v1.0.3.102

It can scan single mp3s and entire folders structures for defects and logs everything if you wish. It will give you a percentage of how good the file is.

Depending on the damage you may be able to fix headers and chop off corrupted tag info with something like a MP3Pro Trim v1.80.exe

Mod parent up (0)

Anonymous Coward | about 2 years ago | (#39918625)

After one page of nonsense and replies from people who didn't even bother to read the question, finally a useful answer!

Re:Mod parent up (-1, Flamebait)

spire3661 (1038968) | about 2 years ago | (#39918779)

Just because people arent telling you what you want to hear doesnt mean we didnt read your question. In short you are an idiot for not hashing/CRC your backups to begin with and for having a 9 month old backup. You have a shit sandwich and are asking us where you should start on it and we are telling you that its a sandwich of shit, it doesnt matter where you start is going to be a fucking mess. If your backup wasnt so old you could restore from it and call it a day instead of playing digital archaeologist.

diff (0)

Anonymous Coward | about 2 years ago | (#39918587)

use the 'diff' command between your backups and the originals.

diff -rq /backuplocation /originallocation could work a treat.

Switches:
r = recursive
q = tell only if files differ

Re:diff (2)

hoggoth (414195) | about 2 years ago | (#39918891)

These comments are full of 'helpful' suggestions to compare to backup or to md5's generated from the backups.
That makes no sense.
If he has a good set of backups JUST RESTORE THE BACKUPS to get known good files back. Why would you read every backup file and every current file, then compare them, then make a list of ones that don't match just to restore the backups. Restore them all. done.

Can you compare to backup? (1)

Anonymous Coward | about 2 years ago | (#39918589)

Suppose your volume is mounted under /mnt/a and your backup is mounted under /mnt/b. Something like this should work:

for f in $(find /mnt/a -mtime -2 | sed -r 's/(^/mnt/a/)(.*)/\2/') ; do

cmp /mnt/a/"$f" /mnt/b/"$f"

done

That should find all files which have been modified within the past 2 days and differ from your backup, which will help narrow down your search. I don't know of a tool that will address your specific question about testing for integrity for particular file formats. For specific file formats, you can automatic this, of course, like using ImageMagick for image files, but I don't know of a tool that will just do everything. It shouldn't be hard to write a script to look at the extension and the output of the "file" command and determine which tool to use to automatically check integrity for that specific file format.

md5sum (3, Interesting)

sl4shd0rk (755837) | about 2 years ago | (#39918593)

or sha1sum if you prefer. Automate in cron against a list of knowns.

eg:
$ md5sum /home/wilbur/Documents/* > /home/wilbur/Docs.md5
$ md5sum -c /home/wilbur/Docs.md5

backups, backups, backups (1)

ballyhoo (158910) | about 2 years ago | (#39918597)

If you're talking about recovery tools, you're already on the wrong track. A Time Capsule costs $300. How much is your data worth? How much are the tools going to cost to recover it? How much is your time worth? I'll bet that the sum of those last three things is a whole pile more than 300 bucks.

If I were you, the thing I'd buy right now is a good backup solution. Re: your existing data, take a full image of your hard disk and take your time recovering it.

Once you've new backup system, you can then sit there with a big smile on your face and comment smugly on all future /. posts about data loss.

Have I lost data? Hell yeah. And it will never happen again.

-bh

Re:backups, backups, backups (1)

spire3661 (1038968) | about 2 years ago | (#39918841)

This is pretty much the overriding sentiment. OP is going to get smugness from us because we've all been there and we all know there is no substitute for vigilance. He failed in his vigilance. Im not trying ot be a dick but rather to really drive home that backups are serious and you should treat them as such. Data > hardware ALWAYS.

P.S. Synology NAS > Time Capsule by an order of magnitude.

Re:backups, backups, backups (1)

AshtangiMan (684031) | about 2 years ago | (#39918971)

Agreed. My data loss happened from theft, and the backup was stolen as well. Now my backup drive sits hidden away, wirelessly capturing my backups. Time capsule is a good solution, but there are others. I just bought a 2gb external drive for $160, combined with a wireless router that has a usb port could be a less expensive alternative. I'm actually thinking that the 2gb drives might not be a good backup solution, and am looking into building a NAS specifically for backup using 4 500gb drives in a Raid 5 configuration as a backup (I know raid isn't backup, but that doesn't mean your backup can't be a raid array).

double-click on the file icon in Explorer (0)

Anonymous Coward | about 2 years ago | (#39918599)

If you see IE come up, followed by several dozen browser tabs in split-second progression, followed by a lot of message boxes proclaiming "Warning! This computer is infected by trojan pirate hacker malware viruses!". And you later open your inbox and see a lot of spam from debt remediation services.

That's a bad sign. I would delete the file.

A question about NTFS versus other file systems... (0)

Anonymous Coward | about 2 years ago | (#39918601)

An honest question :

I've had several crashes over the years with Windows XP but the files, data and system files were never corrupted.
In linux it seems that file systems are not very resilient, and the least crash can corrupt your files.
Is NTFS such a good well designed file system compared to linux file systems ?

Re:A question about NTFS versus other file systems (1)

lordbeejee (732882) | about 2 years ago | (#39918663)

Not sure if it's the HFS+ filesystem, I let my battery run out a lot while working on my ubuntu laptop (ext4 fs), never have a prob.

Perception bias (1)

tobiah (308208) | about 2 years ago | (#39918763)

I've certainly seen corruption with XP crashes, not a big deal because I do backup. About the same with the other file systems. In this case he was using Mac OS 10.7 Lion, which is a mess, and two others accessing the same partition. Not surprised.

Re:A question about NTFS versus other file systems (1)

Githaron (2462596) | about 2 years ago | (#39918833)

An honest question :

I've had several crashes over the years with Windows XP but the files, data and system files were never corrupted. In linux it seems that file systems are not very resilient, and the least crash can corrupt your files. Is NTFS such a good well designed file system compared to linux file systems ?

Linux supports a wide array of filesystems. Which ones have you used? I have used ext3 and ext4 and have never run into file corruption problems. Both of those are journaling filesystems. Journaling filesystems helps prevent corruption in the even of power failure.

Beside the filesystem, one other possibility for corrupted files is a bad hard drive. I know someone who reinstalled Windows on his desktop on a regular basis because key files would go missing or get corrupted. I took a look at it and found out that he simply had a bad hard drive. After getting a new one, he didn't have anymore problems.

Re:A question about NTFS versus other file systems (1)

0123456 (636235) | about 2 years ago | (#39918889)

In linux it seems that file systems are not very resilient, and the least crash can corrupt your files.
Is NTFS such a good well designed file system compared to linux file systems ?

I've never had corrupt files after a Unix crash; be it SunOS, Solaris, HP-UX, Linux or any of the other Unix variants I've used.

I've never had corrupt files after an XP crash, but I've often had scandisk delete files, including a multi-gigabyte game installer that I'd just downloaded before it crashed. It regularly deleted Firefox bookmarks before they switched from storing them in big HTML files.

The NTFS approach appears to be 'I'll guarantee file system consistency but won't guarantee any of your files are still there'. I'm sure you can find similar Linux filesystems, but the most common ones don't seem to have any problems.

For JPEGs (4, Informative)

Jethro (14165) | about 2 years ago | (#39918603)

You can run jpeginfo -c. I have a script that runs against a directory and makes a list for when I do data recovery for all my friends who don't listen when I tell them their 10 year old laptop may be dying soon.

Use mtree (0)

Anonymous Coward | about 2 years ago | (#39918605)

see man mtree for details.

the answer is not "file" (2)

vlm (69642) | about 2 years ago | (#39918615)

unix "file" is not the answer. For some formats it does as little as look at a couple header bytes. Its a great tool to guess a format. Its a terrible verifying parser and does nothing to verify content.

An example of what I'm getting at, with some made up details, unfortunately html is not like well formed xml and every viewer is different anyway so the best way to figure out if a html web page file format is corrupt is unfortunately to pull it up in firefox. This only detects corruption in the structure of the file, if the corruption is just a couple bits then you end up with problems like tQis where the only way to see the h got fouled up is to write more or less a IQ 100 artificial intelligence. All "file" is going to test is pretty much does the file begin with or contain a regex something like less-than html greater-than (getting past the filters).

For content you could F around with, for example, piping a mp3 file thru a decoder and then thru an averaging spectrum analyzer and see if there's anything overly unusual in the spectrum. Also some heuristics like is the file only 1 second long, then its F'ed up.

Re:the answer is not "file" (0)

Anonymous Coward | about 2 years ago | (#39918691)

This has implications for steganography too. If you could solve this, it would make it a lot harder to hide data.

Why are there no good desktop filesystems? (0)

Anonymous Coward | about 2 years ago | (#39918629)

Why isn't there even one filesystem for desktop PCs which stores per-block checksums in the metadata? We store the freakin' last-accessed timestamp whenever a file is read, but no checksums?

Re:Why are there no good desktop filesystems? (1)

vlm (69642) | about 2 years ago | (#39918941)

This reminds me of parity and ECC memory battles of decades past. OK, so it detects an error... Then what? Shut off the power? Not really sure what you'll be gaining. The sole example where it works is when you have the policy and budget to replace anything that takes an error. Useless for this situation.

Re:Why are there no good desktop filesystems? (1)

nyctopterus (717502) | about 2 years ago | (#39919147)

Then what? Restore from last (good) backup, instead of propagating the corrupted file through the backup system until the good version is lost, surely?

Using "find" and "file" (0)

Anonymous Coward | about 2 years ago | (#39918641)

Well, I don't own nor use any Apple products, but I do have some Linux experience.

It would depend entirely on the type of corruption (fully corrupt vs randomly corrupt), but you could try the find command, combined with the file command to detect a files mime type (which if zerod or garbage, would be obvious).

For example:

    find /path/to/start/at -type f -exec file "{}" \;

That would look in "/path/to/start/at" and everything below, for all files ("-type f"), and for each one found, would run the command ("-exec file '{}'") on it. The last part is "\;" since you need to tell the find command that there is "no more", but since ";" already means something to the shell, we need to escape it with a backslash ("\").

The output you get will depend on what it is you are running the command on. If it is an image, file will report such. If it just says "data" or "empty", it is probably corrupt, or worth manually investigating.

right filesystem (2)

zdzichu (100333) | about 2 years ago | (#39918673)

You need good filesystem, with embedded data checksum and self-healing using redundant copies. For Linux - btrfs is fine. For Mac OS X & Linux - ZFS.

Re:right filesystem (1)

ltwally (313043) | about 2 years ago | (#39918789)

The best filesystem to survive a crash is a filesystem designed for an operating system that is expected to crash: NTFS.

Re:right filesystem (1)

Volanin (935080) | about 2 years ago | (#39918869)

Author here:

The problem lies in finding a filesystem that can be accessed by all three OSes. I would go with NTFS as well, but last time I tried, MacOS could not write to it. Every guide out there recommends FAT32, but the 4GB file size limitation is a deal breaker for me.

Re:right filesystem (1)

cpu6502 (1960974) | about 2 years ago | (#39919103)

I use RAR to split the >4GB files in half. To date I'veonly needed to do that once (a DVD rip).

Re:right filesystem (2)

vux984 (928602) | about 2 years ago | (#39919113)

10.5 and 10.6 and I assume 10.7 have read/write support but its not enabled by default, and is not officially supported.

http://hints.macworld.com/article.php?story=20090913140023382 [macworld.com]

Also you are using paragon HFS+ for windows... you should already be aware they have Paragon NTFS for Mac.

A bigger question is whether NTFS is the best filesystem to use, and that's a separate question entirely. And that's a question I don't know the answer to.

So, if the primary OS was windows... then I'd use NTFS.

But if you spend most of your time in linux, and do most of the filesystem writing from linux... then I'd probably pick something robust and linux-native, and then get solutions for OSX and Windows to read it...

Re:right filesystem (2)

Shoe Puppet (1557239) | about 2 years ago | (#39919121)

NTFS-3G supports writing to NTFS. AFAIK, most Linux distributions use it instead of the kernel driver and there's a OSX port as well.

Re:right filesystem (1)

Githaron (2462596) | about 2 years ago | (#39918897)

The best filesystem to survive a crash is a filesystem designed for an operating system that is expected to crash: NTFS.

I don't know if I should laugh or ask what evidence that you have NTFS is the "best".

Tech Tool Pro, perhaps (3, Informative)

Anonymous Coward | about 2 years ago | (#39918677)

Tech Tool Pro, over on the Mac side, has a "File Structures" check which looks at a lot of different structured file types to make sure that their internal format is valid.

Reed Solomon to the rescue (1)

mpol (719243) | about 2 years ago | (#39918685)

It's already too late, but I keep important files with par2 files. That way, when there's like 5% corruption, I can still fix the file.
I do this with flac files and some datafiles.

Also make sure you keep backups going. I guess this was your warning. Everyone needs one.

Re:Reed Solomon to the rescue (1)

Jazari (2006634) | about 2 years ago | (#39918911)

I'll second that. QuickPar ( http://www.quickpar.org.uk/ [quickpar.org.uk] ) has been exceptionally useful to me over and over again. I can check file integrity, recover minor corruption, and revert to past file states if I accidentally modify old archived files. It's also free. The only unfortunate thing is that it doesn't seem to be under development anymore, but at least it still works with Win7/64.

For archival purposes, I've started using WinRAR ( http://www.rarlabs.com/ [rarlabs.com] ) with the file authenticity and recovery options checked. Unfortunately none of this helps you now, but it will help in the future at least...

Guess you should have used a real filesystem (-1)

Anonymous Coward | about 2 years ago | (#39918695)

Something like NTFS, and you'd still have your shit. All it takes is a power outage to take out a freetard OS. Guess if they were paid for their work...

A lot of corrupt files? (4, Interesting)

19thNervousBreakdown (768619) | about 2 years ago | (#39918697)

That seems very strange--the only files that should really be corrupted, unless something extremely rare and catastrophic happened, are the ones that were being written when power went out, or were cached. And even then, a flush usually flushes everything, or at least whole files at once, or areas of disk. Is the partition highly fragmented or something?

I know this doesn't do much for your question, but that kind of failure mode is almost exactly what filesystems do their damnedest to avoid. HFS+, being journaled, should be even more proof against, well, exactly what happened to you. Maybe the Linux driver is poor, but man, if you got silent data corruption on a multitude of files that weren't even being written, that's really bad and the driver should be classified "EXPERIMENTAL" at best, and certainly not compiled into distros' default kernels.

To answer your question, I don't have experience with any tools (I automate my backups, and any archival files go on a RAID volume that does a full integrity scan nightly), but once you find one, you should separate your files into two categories--"must be good", and "can be bad". The "must be good" files (serial #s, source code, etc.), you hand-check, so you know for certain that every one of them is good. It'll also motivate you to replace them now, instead of later when replacements will only get harder to come by. The "can be bad" files (music, pictures, etc.), you do the automated check on and then just delete as you run into ones that the check missed. This has the advantage of concentrating your effort into where it's useful. If you try to check all of your files, you'll just burn out before you finish. You may even want to do more advanced triaging, but you'll have to come up with the categories and criteria there. The main thing is, split this problem up.

Well, for one thing (0)

Anonymous Coward | about 2 years ago | (#39918705)

First, copy to an external disk all files that are on your hard disk and not on your backup.
Next, compare the files on the hard disk with the backup, and copy to that same external disk all the files whose MD5 sum is different AND last modified date is later than the backup.
Then, wipe the hard disk.
Then, restore the backup.
Finally, just look at the files on the external disk. There won't be nearly as many of those.

mplayer/mencoder (or ffmpeg) & imagemagick (4, Informative)

Bonteaux-le-Kun (1360207) | about 2 years ago | (#39918739)

You can just run mencoder or ffmpeg on the mp3 and mov on all the files (with a small shell script, probably involving 'find' or similar), just tell it to write the output to /dev/null, that should go through those files as fast at they can be read from disk and abort with error on those that are broken. For the jpgs, you could try something similar with imagemagick's 'convert', to convert them to whatever format to /dev/null, which also needs to read the whole file content and aborts if they're broken (one should hope). Those converters are really fast, especially ffmpeg, so that should complete in a reasonable time.

Not have bit rot in the first place (1)

Anonymous Coward | about 2 years ago | (#39918749)

Having dealt with file corruption first hand after bit rot on some older media, the best recommendation I have is to find a solution which will prevent future bit rot. I have been using Great Lakes SAN (http://glsan.com) , who bases their system on zfs. Their system using 256bit check sums on all files and can detect and correct any/all file issues which may occur. Furthermore, backups, off-site disaster recover + system monitoring are built into their system.

Re:Not have bit rot in the first place (0)

Anonymous Coward | about 2 years ago | (#39918801)

So does zfs do checksumming of all files?

Re:Not have bit rot in the first place (1)

0123456 (636235) | about 2 years ago | (#39918925)

So does zfs do checksumming of all files?

Yes. All filesystem blocks, I believe.

Check why the files are corrupted (5, Insightful)

ncw (59013) | about 2 years ago | (#39918753)

I'd be asking myself why lots of files became corrupted from one dodgy file system event. Assuming HFS works like file systems I'm more familiar with, it will allocate sequential blocks for files wherever it can. This means that a random filesystem splat is really unlikely to corrupt loads and loads of files. You might expect a file system corruption to cause a load of files to go missing (if a directory entry is corrupted) or corrupt a few files, but not put random errors into loads of files.

I'd check to see whether files I was writing now get corrupted too. It might be dodgy disk or RAM in your computer.

The above might be complete paranoia, but I'm a paranoid person when it comes to my data, and silent corruption is the absolute worst form of corruption.

For next time, store MD5SUM files so you can see what gets corrupted and what doesn't (that is what I do for my digital picture and video archive).

A suggestion: Instead of triple booting... (2)

Sepiraph (1162995) | about 2 years ago | (#39918851)

I'd recommend running a base OS and then run something like VMware workstation so that you run other OSes inside the main OS. One huge benefit is that you can have access to multiple OSes at the same time and you don't need to reboot into them either. With hypervisor technology getting common on desktop, there probably isn't any need to multi-boot unless you have a specific reason not to use virtualization.

Use JHOVE (2)

mattpalmer1086 (707360) | about 2 years ago | (#39918927)

The JSTOR/Harvard Object Validation Environment:

http://hul.harvard.edu/jhove/ [harvard.edu]

It's specifically designed to first probabilistically identify files, then attempt to verify their format.

Disclaimer: I haven't worked on it directly, but I did spend a number in the digital preservation space, so I probably know some of the people who have contributed to it.

Get Rid Of Paragon! (5, Interesting)

Lord_Jeremy (1612839) | about 2 years ago | (#39919031)

Alright now I'm afraid I can't help with your verify problem but I do have one piece of solid advice: get rid of Paragon HFS immediately!

It is a truly shoddy piece of software that as of version 9.0 has a terrible bug that will cause it to destroy HFS+ filesystems. Google "paragon hfs corruption" and you will see many many horror stories from people who just plugged a Mac OS X disk into a Windows machine w/ Paragon HFS and then discovered the entire filesystem was hosed. In my dual-boot win/mac setup I replaced my copy of MacDrive with a trial version of Paragon HFS 9.0 from their website and every single one of the six HFS+ disks I had connected internally were damaged. Disk Utility couldn't do a thing and I had to buy a program called Diskwarrior to even begin to recover data. I ended up losing two disks worth of files anyway.
http://www.mac-help.com/t12137-opened-hfs-drive-win7-paragon-hfs-now-wont-boot.html [mac-help.com]
http://www.wilderssecurity.com/showthread.php?t=299306 [wilderssecurity.com]
http://hardforum.com/showthread.php?t=1677099 [hardforum.com]
http://www.avforums.com/forums/apple-mac/1509344-hfs-super-block-not-found.html [avforums.com]

whew! Anyway the pain I went through after that software very nearly ruined my life was so great, I don't want it to happen to anyone else. According to their own website [paragon-software.com] 9.0 has this awful bug but they fixed it in 9.0.1. Evidently the trial download on the main page is still for version 9.0 and still has the disk destroying bug! Any software company that releases a filesystem driver with this terrible a bug (not to mention the numerous reports of BSODs and other relatively minor problems) clearly has terrible quality assurance and simply can't be trusted.

Re:Get Rid Of Paragon! (1)

Volanin (935080) | about 2 years ago | (#39919163)

Author here:

Just out of curiosity, I went to check the version of my Paragon installer and guess what... it was corrupted! Oh the irony!
Windows is the OS I least use, and I have not booted it for the last month or so. Unless Paragon silently corrupted something there previously and somehow "weakened" the filesystem integrity since. Anyway, thanks for the tip. What do you use currently to read HFS+ in Windows?

It may be that simple (1)

the_B0fh (208483) | about 2 years ago | (#39919063)

Just have your OSX do a repair - it could be that certain VTOC or directory tables were damaged, and a repair may fix it. The files themselves should be OK, but the pointers to them are fubared.

Also try something like http://www.cgsecurity.org/wiki/PhotoRec [cgsecurity.org] or similar to recover deleted files. There's one for OSX. Run it after a repair, and photorec, and you should get most of your crap back.

backup strategy to prevent this (1)

osssmkatz (734824) | about 2 years ago | (#39919133)

You clearly need an image based backup system to prevent this from happening again. It needs to be a chron job (or task scheduler) and run on regular intervals when storage is available. ideally, it needs to be network storage, so that a sudden disconnect (abscence of power) cannot easily corrupt the backup. There is an open source version of Ghost, partd, rsync.. options for you, though I am relatively new to linux so I don't know what the appropriate option is for you. Time machine you could use if you had a separate partition, but I think that isn't what you want. also, fundamentally writing to one partition from three OSes is asking for trouble.
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...