Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Apps That Rely On Ext3's Commit Interval May Lose Data In Ext4

timothy posted more than 5 years ago | from the heavy-trade-off dept.

Data Storage 830

cooper writes "Heise Open posted news about a bug report for the upcoming Ubuntu 9.04 (Jaunty Jackalope) which describes a massive data loss problem when using Ext4 (German version): A crash occurring shortly after the KDE 4 desktop files had been loaded results in the loss of all of the data that had been created, including many KDE configuration files." The article mentions that similar losses can come from some other modern filesystems, too. Update: 03/11 21:30 GMT by T : Headline clarified to dispel the impression that this was a fault in Ext4.

cancel ×

830 comments

Not a bug (5, Informative)

casualsax3 (875131) | more than 5 years ago | (#27157149)

It's a consequence of not writing software properly. Relevant links later in the same comment thread for those who don't might otherwise miss them:

https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/45 [launchpad.net]

https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54 [launchpad.net]

Bull (4, Insightful)

Jane Q. Public (1010737) | more than 5 years ago | (#27157285)

Blaming it on the applications is a cop-out. The filesystem is flawed, plain and simple. The journal should not be written so far in advance of the records actually being stored. That is a recipe for disaster, no matter how much you try to explain it away.

Re:Bull (5, Funny)

Lord Ender (156273) | more than 5 years ago | (#27157609)

In fact, there is no such thing as an OS bug! All good programmers should re-implement essential and basic operating system features in their user applications whenever they run into so-called "OS bugs." If you question this, you must be a bad programmer, obviously.

Re:Bull (1, Insightful)

Jane Q. Public (1010737) | more than 5 years ago | (#27157741)

Delayed allocation is like leading a moving target when shooting. The more distant the target, the more you have to lead, and the greater chance there is of something happening between the time you pull the trigger and the time the bullet reaches its target zone: the wind may shift, the target may change speed, or direction, etc.

The longer you delay allocation after writing the journal (and Ext4 seems to take this to extremes), the more chance there is of something -- almost anything really -- going wrong between the time the journal is written and the files being written. And here is just such a case of something changing state (whether it should or not) between those times. You many call it an anomaly but a competent engineer would have to expect this to occur.

Re:Bull (5, Insightful)

wild_berry (448019) | more than 5 years ago | (#27157629)

The journal isn't being written before the data. Nothing is written for periods between 45-120 seconds so as to batch up the writing to efficient lumps. The journal is there to make sure that the data on disk makes sense if a crash occurs.

If your system crashes after a write hasn't hit the disk, you lose either way. Ext3 was set to write at most 5 seconds later. Ext4 is looser than that, but with associated performance benefits.

Re:Bull (1)

Jane Q. Public (1010737) | more than 5 years ago | (#27157779)

That contradicts TFA, which clearly states that there is a delay of up to 150 seconds between the time the journal is written and the time the data is actually written to disk.

Re:Bull (2, Insightful)

gweihir (88907) | more than 5 years ago | (#27157855)

The problem is KDE not doing syncs and not keeping backups on updates of critical files. Any competent implementor will try to keep these to a minimum with critical files and if they have to be done, do them carefully. Seems to me the KDS folks have to learn a basic lesson in robustness now.

Re:Bull (5, Informative)

Anonymous Coward | more than 5 years ago | (#27157635)

This is NOT a bug. Read the POSIX documents.

Filesystem metadata and file contents is NOT required to be synchronous and a sync is needed to ensure they are syncronised.

It's just down to retarded programmers who assume they can truncate/rename files and any data pending writes will magically meet up a-la ext3 (which has a mount option which does not sync automatically btw).

RTFPS (Read The Fine POSIX Spec).

Re:Bull (4, Insightful)

Eugenia Loli (250395) | more than 5 years ago | (#27157761)

Rewriting the same file over and over is known for being risky. The proper sequence is to create a new file, sync, rename the new file on top of the old one, optionally sync. In other words, app developers must be more careful of their doings, not put all blame to the filesystems. It's so much that an fs can do to avoid such bruhahas. Many other filesystems have similar behavior to the ext4 btw.

Re:Bull (2, Interesting)

Jane Q. Public (1010737) | more than 5 years ago | (#27157821)

That does not make it any less of a filesystem limitation. While it is true that a well-written app should be aware of potential timing issues, all the application itself should ever suffer is delays in the I/O. Anything else is a flaw. Other FSs may share the flaw, but it is still a flaw.

Re:Bull (1)

Profane MuthaFucka (574406) | more than 5 years ago | (#27157901)

Except fsync on a Mac is a null operation. The fsync(), it does nothing!

Re:Bull (1)

gweihir (88907) | more than 5 years ago | (#27157815)

Replacing critical files without syncing and without keeping backups is non-robust behaviour, plain and simple. Seems to me some KDS implementers where getting lazy. Most text editors do it better.

Re:Bull (5, Informative)

pc486 (86611) | more than 5 years ago | (#27157841)

Ext3 doesn't write out immediately either. If the system crashes within the commit interval, you'll lose whatever data was written during that interval. That's only 5 seconds of data if you're lucky, much more data if you're unlucky. Ext4 simply made that commit interval and backend behavior different than what applications were expecting.

All modern fs drivers, including ext3 and NTFS, do not write immediately to disk. If they did then system performance would really slow down to almost unbearable speeds (only about 100 syncs/sec on standard consumer magnetic drives). And sometimes the sync call will not occur since some hardware fakes syncs (RAID controllers often do this).

POSIX doesn't define flushing behavior when writing and closing files. If your applications needs data to be in NV memory, use fsync. If it doesn't care, good. If it does care and it doesn't sync, it's a bad application and is flawed, plain and simple.

Re:Bull (1)

erroneus (253617) | more than 5 years ago | (#27157911)

An application should not care what file system it is running on. If applications written "normally" have less chance of catastrophic failure, then you can most certainly blame the apps for attempting to take advantage of a particular feature of a particular file system. And if an application does behave this way, it should be written to first determine if it is, in fact, running on that particular file system and if not should disable any features that utilize that filesystem's advantages.

This is a problem of applications making faulty assumptions.

Re:Bull (1)

knewter (62953) | more than 5 years ago | (#27157921)

You're wrong about this. The second comment covers the appropriate way to write the code, and via POSIX can guarantee that you don't lose data.

Hoping you've done something right isn't enough.

Re:Not a bug (5, Insightful)

mbkennel (97636) | more than 5 years ago | (#27157323)

I disagree. "Writing software properly" apparently means taking on a huge burden for simple operations.

Quoting T'so:

"The final solution, is we need properly written applications and desktop libraries. The proper way of doing this sort of thing is not to have hundreds of tiny files in private ~/.gnome2* and ~/.kde2* directories. Instead, the answer is to use a proper small database like sqllite for application registries, but fixed up so that it allocates and releases space for its database in chunks, and that it uses fdatawrite() instead of fsync() to guarantee that data is written on disk. If sqllite had been properly written so that it grabbed new space for its database storage in chunks of 16k or 64k, and released space when it was no longer needed in similar large chunks via truncate(), and if it used fdatasync() instead of fsync(), the performance problems with FireFox 3 wouldn't have taken place."

In other words, if the programmer took on the burden of tons of work and complexity in order to replicate lots of the functionality of the file system and make it not the file system's problem, then it wouldn't be my problem.

I personally think it should be perfectly OK to read and write hundreds of tiny files. Even thousands.

File systems are nice. That's what Unix is about.

I don't think programmers ought to be required to treat them like a pouty flake: "in some cases, depending on the whims of the kernel and entirely invisible moods, or the way the disk is mounted that you have no control over, stuff might or might not work."

Re:Not a bug (1)

virtue3 (888450) | more than 5 years ago | (#27157435)

... and ultimately who is to say the database wont eventually be flawed because whoever programmed THAT has a workaround for the whole bloody filesystem?

The filesystem should definitely be abstracted to the point where the software does not need to do anything super special (telling the OS to manually plug in the cached writes it's insane).

Mind you, this is pretty heavy OS code, so, YMMV. Bottom line, even these guys shouldn't fucking care if you're using ext4 or fat32(just an example! and yes, there are exceptions, but general software case, you shouldn't need to) at the end of the day.

Re:Not a bug (1)

gweihir (88907) | more than 5 years ago | (#27157471)

Well, everybody should by know have noticed that fsync and fdatasync are not the same anymore (they were in Linux 2.2). Still, both should get your data reliably to disk (unless the disk does write buffereing). Not using either and then expecting your data to be on disk is indeed an implementation problem on the application side.

I was unable to find a system call named "fdatawrite". It seems that one does not exist or at least is an experimental and very new feature.

Re:Not a bug (5, Interesting)

Qzukk (229616) | more than 5 years ago | (#27157501)

I personally think it should be perfectly OK to read and write hundreds of tiny files. Even thousands.

It is perfectly OK to read and write thousands of tiny files. Unless the system is going to crash while you're doing it and you somehow want the magic computer fairy to make sure that the files are still there when you reboot it. In that case, you're going to have to always write every single block out to the disk, and slow everything else down to make sure no process gets an "unreasonable" expectation that their is safe until the drive catches up.

Fortunately his patches will include an option to turn the magic computer fairy off.

Re:Not a bug (3, Insightful)

Hatta (162192) | more than 5 years ago | (#27157547)

The proper way of doing this sort of thing is not to have hundreds of tiny files in private ~/.gnome2* and ~/.kde2* directories. Instead, the answer is to use a proper small database like sqllite for application registries

Translation: "Our filesystem is so fucked up, even SQL is better."

WTF is this guy thinking? UNIX has used hundreds of tiny dotfiles for configuration for years and it's always worked well. If this filesystem can't handle it, it's not ready for production. Why not just keep ALL your files in an SQL database and cut out the filesystem entirely?

Re:Not a bug (1)

gweihir (88907) | more than 5 years ago | (#27157917)

The problem is not the tiny files. The problem is opening lots of files and rewriting them without keeoing backups or doing syncs. That is inherently non-robust and should never be done on critical files.

The database is just an example on how to do it robust and fast. fdatasyncing a lot of small files is bound to be slow. Databases bave better performance on a lot of small updates.

Still, the blame is on the KDE people that use a very risky update pattern in a place where it is completely inappropriate.

Re:Not a bug (1)

malkir (1031750) | more than 5 years ago | (#27157593)

"in some cases, depending on the whims of the kernel and entirely invisible moods, or the way the disk is mounted that you have no control over, stuff might or might not work." Sounds a lot like ActionScript...

Re:Not a bug (5, Informative)

Anonymous Coward | more than 5 years ago | (#27157655)

Quoting T'so:

"The final solution, is we need properly written applications and desktop libraries. The proper way of doing this sort of thing is not to have hundreds of tiny files in private ~/.gnome2* and ~/.kde2* directories. Instead, the answer is to use a proper small database like sqllite for application registries, but fixed up so that it allocates and releases space for its database in chunks, ...

Linux reinvents windows registry?
Who knows what they will come up with next.

Re:Not a bug (3, Insightful)

GigsVT (208848) | more than 5 years ago | (#27157661)

Instead, the answer is to use a proper small database like sqllite for application registries

Yeah, linux should totally put in a Windows style registry. What the fuck is this guy on.

Re:Not a bug (5, Funny)

Profane MuthaFucka (574406) | more than 5 years ago | (#27158005)

That would be smart, but only if the SQL database is encrypted too. It's theoretically possible to read a registry with an editor, and we can't have that. Also, we need a checksum on the registry. If the checksum is bad, we have to overwrite the registry with zeroes. Registries are monolithic, and we have to make sure that either it's good data, or NONE of it is good data. Otherwise the user would get confused.

I am so excited about this that I'm going to start working on it just as soon as I get done rewriting all my userspace tools in TCL.

Re:Not a bug (1)

jgostling (1480343) | more than 5 years ago | (#27157699)

Quoting T'so:

"...use a proper small database like sqllite for application registries..."

Is it just me or does this sound an awful lot like the dreaded Windows registry...

Re:Not a bug (5, Insightful)

Logic and Reason (952833) | more than 5 years ago | (#27157717)

I personally think it should be perfectly OK to read and write hundreds of tiny files. Even thousands.

To paraphrase https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54 [launchpad.net] : You certainly can use tons of tiny files, but if you want to guarantee your data will still be there after a crash, you need to use fsync. And if that causes performance problems, then perhaps you should rethink how your application is doing things.

Re:Not a bug (5, Insightful)

davecb (6526) | more than 5 years ago | (#27157797)

It seems exceedingly odd that issuing a write for a non-zero-sized file and having it delayed causes the file to become zero-size before the new data is written.

Generally when one is trying to maintain correctness one allocates space, places the data into it and only then links the space into place (paraphrased from from Barry Dwyer's "One more time - how to update a master file", Communications of the ACM, January 1981).

I'd be inclined to delay the metadata update until after the data was written, as Mr. Tso notes was done in ext3. That's certainly what I did back in the days of CP/M, writing DSA-formated floppies (;-))

--dave

Re:Not a bug (1)

lilo_booter (649045) | more than 5 years ago | (#27157809)

Have to agree - suggesting a db to replace 'hundreds of small files' is an appalling attitude. Doesn't even make sense that a developer who's ever used a source code repo would think that was reasonable.

Unlikely scenario, but say there's a source code repo running on ext4, and as a developer, I want to make changes to hudreds of files in the repo - I checkin and the server goes bang - what's the repo state likely to be? How/why would you use a db to implement the repo? Why should the repo be patched to run specifically on that file system?

Bizarre...

Re:Not a bug (1, Insightful)

somenickname (1270442) | more than 5 years ago | (#27157847)

Beyond that, he's essentially advocating the Windows Registry. He's a very smart person but, Unix is about dot files. If you take them away you, take away the "Unixness" of the machine. I don't care if a filesystem isn't pleased by hundreds or thousands of tiny config files. That's how the machine works. Make your filesystem handle it.

Cordially,

An ext4 user.

Re:Not a bug (5, Informative)

OeLeWaPpErKe (412765) | more than 5 years ago | (#27157867)

Let's not forget that the only consequence of delayed allocation is the write-out delay changing. Instead of data being "guaranteed" on disk in 5 seconds, that becomes 60 seconds.

Oh dear God, someone inform the president ! Data that is NEVER guaranteed to be on disk according to spec is only guaranteed on disk after 60 seconds.

You should not write your application to depend on filesystem-specific behavior. You should write them to the standard, and that means fsync(). No call to fsync, look it up in the documentation (man 2 write).

The rest of what Ted T'so is saying is optimization, speeding up the boot time for gnome/kde, it is not necessary for correct workings.

Please don't FUD.

You know I'll look up the docs for you :

(quote from man 2 write)

NOTES
              A successful return from write() does not make any guarantee that data has been committed to disk. In fact, on some buggy implementations, it does not even guarantee
              that space has successfully been reserved for the data. The only way to be sure is to call fsync(2) after you are done writing all your data.

              If a write() is interrupted by a signal handler before any bytes are written, then the call fails with the error EINTR; if it is interrupted after at least one byte has
              been written, the call succeeds, and returns the number of bytes written.

That brings up another point, almost nobody is ready for the second remark either (write might return after a partial write, necessitating a second call)

So the normal case for a "reliable write" would be this code :

size_t written = 0;
int r = write(fd, &data, sizeof(data))
while (r >= 0 && r + written sizeof(data)) {
        written += r;
        r = write(fd, &data, sizeof(data));
}
if (r 0) { // error handling code, at the very least looking at EIO, ENOSPC and EPIPE for network sockets
}

and *NOT*

write(fd, data, sizeof(data)); // will probably work

Just because programmers continuously use the second method (just check a few sf.net projects) doesn't make it the right method (and as there is *NO* way to fix write to make that call reliable in all cases you're going to have to shut up about it eventually)

Hell, even firefox doesn't check for either EIO or ENOSPC and certainly doesn't handle either of them gracefully, at least not for downloads.

Re:Not a bug (1)

Jurily (900488) | more than 5 years ago | (#27157871)

In other words, if the programmer took on the burden of tons of work and complexity in order to replicate lots of the functionality of the file system and make it not the file system's problem, then it wouldn't be my problem.

In other words, if the programmer took on the burden to use a proper database interface for their database, it could have been optimized as such.

I do agree, however, that data loss is inexcusable under any circumstances. Isn't that why we have journals in the first place?

A Windows-like registry can not be the answer. (0)

Anonymous Coward | more than 5 years ago | (#27157883)

T'so may argue that we can't "have hundreds of tiny files in private ~/.gnome2* and ~/.kde2* directories", but isn't UNIX philosophy all about having just that? And isn't it the filesystem's job to handle the files? Fix EXT4!

Re:Not a bug (0)

Anonymous Coward | more than 5 years ago | (#27157923)

Guess what, writing software is hard!

Writing good software is even harder!

If you are lazy go take a job at McDonalds so that you don't have to think when you work ...

Re:Not a bug (0, Redundant)

ickpoo (454860) | more than 5 years ago | (#27157973)

Clearly I won't be using Ext4 for a long while. The attitude of T'so indicates that he doesn't really know the purpose of a file system. It doesn't matter how capable this guy is, he is an idiot.

Suggesting that this is the domain of the application is crazy. Writing a bunch of small files is par for the course for many applications, suddenly all these apps need to be reworked? The app wrote the file, probably received no error messages indicating that something might be wrong, and closed the file, and yet the file system is loosing data. The file system is suspect.

Re:Not a bug (1)

Conley Index (957833) | more than 5 years ago | (#27157345)

I already wondered about the heise.de title blaming the file system. Now Slashdot repeats it.

I have seen the same on FreeBSD using UFS (with soft updates).

KDE4 is supposed to be portable enough to run on file systems that have no data journaling or a guarantee for operations on different files to be written in a certain order without issuing a sync.

Re:Not a bug (1)

icebike (68054) | more than 5 years ago | (#27157557)

Its worse than than a KDE problem. It goes to the heart of Linux/Unix which
have always been dependent on a multitude of small text files.

Anytime you suggest users re-write their entire code base to get around
a bug you've created your professional pride should well up, grab you by
the wattles and slap you till you spit.

Re:Not a bug (3, Insightful)

idontgno (624372) | more than 5 years ago | (#27157351)

lol.

It's a consequence of a filesystem that makes bad assumptions about file size.

I suppose in your world, you open a single file the size of the entire filesystem and just do seek()s within it?

It's a bug. A filesystem which does not responsibly handle any file of any size between 0 bytes and MAXFILESIZE is bugged.

Deal with it and join the rest of us in reality.

Re:Not a bug (4, Insightful)

TerranFury (726743) | more than 5 years ago | (#27157523)

Ummm... it deals correctly with files of any size. It just loses recent data if your system crashes before it has flushed what it's got in RAM to disk. That's the case for pretty much any filesystem; it's just a matter of degree, and how "recent" is recent.

Re:Not a bug (4, Insightful)

fireman sam (662213) | more than 5 years ago | (#27157657)

The benefit of journaling file systems is that after the crash you still have a file system that works. How many folks remember when Windows would crash, resulting in a HDD that was so corrupted the OS wouldn't start. Same with ext2.

If these folks don't like asynchronous writes, they can edit their fstab (or whatever) to have the sync option so all their writes will be synchronous and the world will be a happy place.

Note that they will also have to suffer a slower system, and possible shortened lifetime of their HDD, but at least there configuration files will be safe.

Re:Not a bug (-1, Flamebait)

idontgno (624372) | more than 5 years ago | (#27157805)

it deals correctly with files of any size. It just loses recent data

You work in marketing, don't you? Only an advertising weenie could actually speak those two phrases consecutively with a straight face.

If it loses recent data, under any conditions, it's bugged. Period. Full stop. End of line. Close tag.

Listen up. Here's exactly what is supposed to happen. I open() a file in the filesystem, creating it in the process. I write() one byte to it. I close() the file. The data is physically on disk within milliseconds.

OR ELSE THE FILESYSTEM IS BUGGED.

Re:Not a bug (2, Insightful)

CyprusBlue113 (1294000) | more than 5 years ago | (#27157899)

Unless you have an explicit sync there, YOUR ASSUMPTION IS BUGGED. This is completely reasonable behavior of a write caching system.

Re:Not a bug (4, Insightful)

caerwyn (38056) | more than 5 years ago | (#27157941)

No. It's not.

If what you say is true there would be no need for the fsync() function (and related ones).

Read the standards if you want. The filesystem is only bugged if it loses recent data under conditions where the application has asked it to guarantee that the data is safe. If the app hasn't asked for any such guarantee by calling fsync() or the like, the filesystem is free to do as it likes.

Re:Not a bug (2, Insightful)

Anonymous Coward | more than 5 years ago | (#27157969)

You're wrong, and so are most comments here.

When you open() a file in the filesystem, wrtei() one byte to it, and close() that file, you haven't really guaranteed crap on any normal filesystem, unless you're using a very strange filesystem or you're using non-standard mount options to force every action to happen synchronously.

If a crash happens between close() and the filesystem flushing data to disk, you will lose data. If you want to prevent this happening, you must either use calls like fsync() or fdatasync() (or many other mechanisms that act similarly), or use mount options that make all calls synchronous.

The only reason this has become a big blow-up issue with ext4 is that while other filesystems generally would sync the data shortly anyways, ext4 does not. Everyone has been relying on bad assumptions about filesystem behavior and getting by on the fact that "usually" the situation was resolved "somewhat quickly". ext4 does not resolve these things quickly, in the name of efficiency and performance. There was a never a guarantee under any filesystem of things getting done (to disk) quickly unless you explicitly ask for it.

Re:Not a bug (4, Informative)

davecb (6526) | more than 5 years ago | (#27157825)

Er, actually it removes the previous data, then waits to replace it for long enough that the probability of noticing the disappearance approaches unity on flaky hardware (;-))

--dave

Re:Not a bug (2, Interesting)

jgarra23 (1109651) | more than 5 years ago | (#27157433)

Talk about doublespeak! Not a bug vs. It's a consequence of not writing software properly. reminds me of that FG episode where Stewie says, "it's not that I want to kill Lois... it's that I don't... want... her... to... live... anymore."

Re:Not a bug (1)

Slumdog (1460213) | more than 5 years ago | (#27157791)

Talk about doublespeak! Not a bug vs. It's a consequence of not writing software properly. reminds me of that FG episode where Stewie says, "it's not that I want to kill Lois... it's that I don't... want... her... to... live... anymore."

I think you're confusing lies with mistakes/misunderstandings. Bugs are usually unknown...unintentional. Writing code improperly is an intentional act, possibly with unknown consequences. Windows Vista isn't a "Bug" (although I expect some /. smartass to assert that it is...), Vista is simply badly designed.

Re:Not a bug (1, Troll)

dedazo (737510) | more than 5 years ago | (#27157457)

You'll have to excuse me for chuckling a bit here, but if NTFS or the filesystem for OS X (whatever that is) had this problem and someone suggested that it's an "application problem" they'd be stoned to death.

As an application developer, the last thing I want to worry about is whether or not the fraking filesystem is going to persist my data to disk. That's why I write applications, and other people write file systems and kernels.

You can talk to me about good practices when doing I/O on any given platform, but please don't insult my intelligence by claiming the FS layer's failure to do something is due to my saving lots of little files, or lots of big ones, or anything in between. That's just stupid.

Re:Not a bug (5, Insightful)

msuarezalvarez (667058) | more than 5 years ago | (#27157787)

As an application developer, the last thing I want to worry about is whether or not the fraking filesystem is going to persist my data to disk.

As an application developer, you are expected to know what the API does, in order to use it correctly. What Ext4 is doing is 100% respectful of the spec.

Actually, no. (2)

Jane Q. Public (1010737) | more than 5 years ago | (#27158019)

As a user of a high-level language, I should not be expected to know the disk I/O API in a given OS. That is for the authors of the compiler or interpreter. Do not expect me to know Assembly language for a given chip, for example, in order to implement a calendar program. The very idea is ridiculous.

Re:Not a bug (1)

PinchDuck (199974) | more than 5 years ago | (#27157755)

The point of having a rock-solid filesystem is to have a rock-solid filesystem. Any filesystem that crashes and loses data is bad. What is the point of a journal again? To enforce someone's idea of how an API should be coded to, or to reduce data loss?

bah? (1)

negRo_slim (636783) | more than 5 years ago | (#27157153)

He advises that "this is really more of an application design problem more than anything else."

Re:bah? (-1, Flamebait)

Anonymous Coward | more than 5 years ago | (#27157305)

blame KDE! hahahahahahahahahaha

Now, if this affected twm, xterm, and emacs, well, I'd be quite pissed. But for now, I find this hilarious.

Don't worry (5, Funny)

sakdoctor (1087155) | more than 5 years ago | (#27157155)

Don't worry guys, I read the summary this time, and it only affects the German version of ext4.

Re:Don't worry (3, Funny)

Daimanta (1140543) | more than 5 years ago | (#27157173)

Makes perfect sense: Germans are rediculously punctual, if the allocation is delayed you just KNOW something is terribly wrong.

Re:Don't worry (1)

microbee (682094) | more than 5 years ago | (#27157225)

And only on a specific distro. (haha Ubuntu users)

Re:Don't worry (1)

migla (1099771) | more than 5 years ago | (#27157453)

"And only on a specific distro."

Except, the first commenter on launchpad, who is not the bug reporter, is running Gentoo.

Re:Don't worry (2, Funny)

microbee (682094) | more than 5 years ago | (#27157527)

OMG, you expect me to RTFA??!! In a BUGzilla?

If in other "modern" filesystems.... (0)

ducomputergeek (595742) | more than 5 years ago | (#27157207)

Newer !== better

Re:If in other "modern" filesystems.... (1)

Daimanta (1140543) | more than 5 years ago | (#27157317)

Yes, that's why I'll wait for ext4 SP2.

Re:If in other "modern" filesystems.... (3, Insightful)

internerdj (1319281) | more than 5 years ago | (#27157341)

It is a trade-off between reliability and performance. In this case, Older!== better either. A lot of OS design decisions are trade-offs.

Re:If in other "modern" filesystems.... (3, Insightful)

CannonballHead (842625) | more than 5 years ago | (#27157533)

I'll take "I didn't lose my data" over "ext4 runs 1.5x faster than ext3," thank you. What use is performance to me if I have to be absolutely certain that it won't crash, or I lose my (in my very high performance filesystem) data?

Also, ext4 is toted as having additional reliability checks to keep up with scalability, etc... not less reliable at expense of performance.

Reliability

As file systems scale to the massive sizes possible with ext4, greater reliability concerns will certainly follow. Ext4 includes numerous self-protection and self-healing mechanisms to address this.

(from Anatomy of ext4 [ibm.com] )

I can only imagine the response if tests were done on Windows 7 beta that showed a crash after this or that resulted in loss of data. :)

Re:If in other "modern" filesystems.... (2, Insightful)

internerdj (1319281) | more than 5 years ago | (#27157659)

Thing is that ext3 is using the same strategy on a smaller scale. The same argument could be made to say that 3 seconds is far too long to be out of date. How many instructions are you going to run in 3 seconds? Defects run at 5-8 per/kloc on average. Certainly not all are fatal, but how long of a delay is too long to avoid a potentially fatal defect? Obviously the delay they have chosen is too long, but is the performance hit that ext3 takes for having a 3 second delay rather than a 5 or 10 or 15 second delay worth it?

Re:If in other "modern" filesystems.... (1)

CannonballHead (842625) | more than 5 years ago | (#27157943)

That probably depends on the application(s) running (presuming some sort of server application). Of course, a competent admin in that situation would hopefully choose a suitable filesystem, which may not be ext4 if the delay remains too high.

I don't know what the performance difference is from a "home user" perspective between ext3 and ext4, but if it isn't really noticeable, then why not stay with a more reliable delay time? If most users wouldn't be able to notice the performance increase, it might be better to cater towards reliability, not performance, in that situation.

Admittedly, this is all out of my hat, since I haven't done any performance tests (have you? I'd be interested in hearing first-hand experience of performance increases... will have to look online, too...)

pr0n (5, Funny)

Quintilian (1496723) | more than 5 years ago | (#27157223)

Real reason for the bug report: Someone's angry and wants his porn back.

Works as expected... (5, Insightful)

gweihir (88907) | more than 5 years ago | (#27157289)

The problem here is that delaying writes speeds up things greatly but has this possible side-effect. For a shorter commit time, simply stay with ext3. You can also mount your filesystems "sync" for a dramatic performance hit, but no write delay at all.

Anyways, with moderen filesystems data does not go to disk immediately, unless you take additional measures, like a call to fsync. This should be well known to anybody that develops software and is really not a surprise. It has been done like that on server OSes for a very long time. Also note that there is no loss of data older than the write delay period and this only happens on a system crash or power-failure.

Bottom line: Nothing to see here, except a few people that do not understand technology and are now complaining that their expectations are not met.

Exactly (-1)

Jane Q. Public (1010737) | more than 5 years ago | (#27157371)

This is a design decision, and it is a problem of the filesystem, no matter how much they try to blame it on "poorly written applications". Applications should be able to do whatever they want. It is the job of the filesystem to accurately record it. Period.

If an application that reads and writes lots of small files fails under Ext4, then it is Ext4's fault, not the application. An application should be able to read and write lots of small files if it wants... I can think of a great many practical examples.

Re:Exactly (5, Insightful)

TerranFury (726743) | more than 5 years ago | (#27157449)

Meh, this is crap that happens only when the system crashes, and is pretty much unavoidable if you're doing a lot of caching in memory -- which, coincidentally, is what you need to do to maximize performance. This doesn't sound like the filesystem's "fault" or the application's "fault;" it's just the way things are. Everybody knows that if you don't cleanly unmount, most bets are off.

Re:Exactly (0)

Anonymous Coward | more than 5 years ago | (#27157687)

Lack of data loss during unexpected power outages or shutdowns was the primary reason people adopted ext3. Journaling was supposed to fix exactly this.

Re:Exactly (1)

msuarezalvarez (667058) | more than 5 years ago | (#27157817)

No, not really. Journalling is done so that after a crash the filesystem is in a consistent state, and that does *not* include the no-data-loss requirement you are talking about.

Re:Exactly (1)

Psychotria (953670) | more than 5 years ago | (#27157785)

Your comment scares me. See my comment below. Are you sure you're not me?

Re:Exactly (0, Flamebait)

somenickname (1270442) | more than 5 years ago | (#27157947)

Wait, are you saying the crashing of an alpha level OS could cause data loss? I find this unfathomable.

Re:Exactly (5, Insightful)

gweihir (88907) | more than 5 years ago | (#27157591)

The problem is not the many small files, but the missing disk sync. The many small files just make the issue more pbvous.

True, with ext4 this is more likely to cause problems, but any delayed write can cause this type of issue when no explicit flush-to-disk is done. And lets face it: fsync/fdatasync are not really a secret to any competent developer.

What however is a mistake, and a bad one, is making ext4 the default filesystem at this time. I say give it another half year, for exactly this type of problem.

Re:Exactly (1)

Psychotria (953670) | more than 5 years ago | (#27157763)

If an application that reads and writes lots of small files fails under Ext4, then it is Ext4's fault, not the application. An application should be able to read and write lots of small files if it wants... I can think of a great many practical examples

Yeah, but it's not just ext4, it's any modern filesystem. If the application writes thousands of individual files (without fsync()) and there is a power failure or system crash then data loss is possible. This isn't ext4's 'fault' any more than it's the applications 'fault'. It isn't a bug or a bad design decision either; it's just how things are.

Re:Exactly (1)

pc486 (86611) | more than 5 years ago | (#27157967)

Lots of small files isn't bad on its own. In fact, it's downright common. Ext4's design does consider this case and makes these operations efficient.

The problem with small files is data consistency. If the application requires a file hierarchy and associated buffers to be on disk before continuing, then a call to fsync() is required (even on ext3). Implicitly syncing on every small file will kill performance, so don't do that.

Re:Works as expected... (0)

Anonymous Coward | more than 5 years ago | (#27157411)

Bottom line: Nothing to see here, except a few people that do not understand technology and are now complaining that their expectations are not met.

Sounds like the average Ubuntu user to me ...

Re:Works as expected... (1)

NorthWay (1066176) | more than 5 years ago | (#27157437)

>Bottom line: Nothing to see here, except a few people that do not understand technology and are now complaining that their expectations are not met.

Or go for some other tech that can detect it. Like end-to-end checksumming.

Re:Works as expected... (5, Insightful)

girlintraining (1395911) | more than 5 years ago | (#27157467)

Nothing to see here, except a few people that do not understand technology and are now complaining that their expectations are not met.

You're right, there really is nothing to see here. Or rather, there's nothing left. As the article says, a large number of configuration files are opened and written to as KDE starts up. If KDE crashes and takes the OS with it (as it apparently does), those configuration files may be truncated or deleted entirely -- the commands to re-create and write them having never been sync'd to disk. As the startup of KDE takes longer than the write delay, it's entirely possible for this to seriously screw with the user.

The two problems are:

1. Bad application development. Don't delete and then re-create the same file. Use atomic operations that ensure that files you are reading/writing to/from will always be consistent. This can't be done by the Operating System, whatever the four color glossy told you.

2. Bad Operating System development. If an application kills the kernel, it's usually the kernel's fault (drivers and other code operating in priviledged space is obviously not the kernel's fault) -- and this appears to be a crash initiated from code running in user space. Bad kernel, no cookie for you.

Re:Works as expected... (3, Insightful)

gweihir (88907) | more than 5 years ago | (#27157769)

I agree on both counts. Some comments

1) The right sequence of events is this: Rename old file to backup name (atomic). Write new file, sync new file and then delete the backup file. It is however better for anything critical to keep the backup. In any case an application should offer to recover from the backup if the main file is missing or broken. To this end, add a clear end-mark that allows to check whether the file was written completely. Nothing new or exciting, just stuff any good software developer knows.

2) Yes, a kernel should not crash. Occasionally it happens nonetheless. It is important to notice that ext4 is blameless in the whole mess (unless it causes the crash).

Re:Works as expected... (0)

Anonymous Coward | more than 5 years ago | (#27157481)

Bottom line: Nothing to see here, except a few people that do not understand technology and are now complaining that their expectations are not met.

Like expectations that a filesystem not lose data just because you have small files?

Translation (3, Insightful)

microbee (682094) | more than 5 years ago | (#27157851)

We use techniques that show great performance so people can see we beat ext3 and other filesystems.

Oh shit, as a tradeoff we lose more data in case of a crash. But it's not our fault.

Honestly, you cannot eat your cake and have it too.

Re:Works as expected... (1)

hedwards (940851) | more than 5 years ago | (#27157937)

I had to reread that a few times, but it seems like a compelling argument for COW filesystems with a backend scrubber to fix any suboptimal file placements. I'm sure it isn't quite as fast, but unless the disk is constantly accessed it's probably not going to make much of a negative impact.

And depending upon the type of files most handled one could probably optimize where the default placements are made.

Firefox fix (0)

Anonymous Coward | more than 5 years ago | (#27157329)

No problem, just run Firefox and it'll make sure your disks are synch'd all the time ;)

my experience (0)

Anonymous Coward | more than 5 years ago | (#27157425)

not with ext4, but with xfs. lat month, I created an xfs logical volume and exported it with nfs (with fsync). I chose xfs because this was for large files (videos). After copying a couple files, the xfs volume developed errors and was unrecoverable. I've never seen a file system so fucked up so easily without hardware problems.

Programmers at Microsoft suck (1, Funny)

Xoc-S (645831) | more than 5 years ago | (#27157443)

If Microsoft hadn't written this crappy code, and they'd used Linux instead, this wouldn't have happened.

Classic tradeoff (5, Insightful)

Otterley (29945) | more than 5 years ago | (#27157445)

It's amazing how fast a filesystem can be if it makes no guarantees that your data will actually be on disk when the application writes it.

Anyone who assumes modern filesystems are synchronous by default is deluded. If you need to guarantee your data is actually on disk, open the file with O_SYNC semantics. Otherwise, you take your chances.

Moreover, there's no assertion that the filesystem was corrupt as a result of the crash. That would be a far more serious concern.

Re:Classic tradeoff (3, Informative)

imsabbel (611519) | more than 5 years ago | (#27157643)

Its even WORSE than just being asynchronous:

EXT4 reproducably delays write ops, but commits journal updates concerning this write.

Re:Classic tradeoff (2, Interesting)

slashdotmsiriv (922939) | more than 5 years ago | (#27157759)

Even if you use O_SYNC, or fsync() there is no guarantee that the data are safely stored on disk.

You also have to disable HDD caching, e.g., using
  hdparm -W0 /dev/hda1

Re:Classic tradeoff (2, Insightful)

gweihir (88907) | more than 5 years ago | (#27157989)


Even if you use O_SYNC, or fsync() there is no guarantee that the data are safely stored on disk.

You also have to disable HDD caching, e.g., using
    hdparm -W0 /dev/hda1

Well, yes, but unless you have an extreme write pattern, the disk will not take long to flush to platter. And this will only result in data loss on power failure. If that is really a concern, get an UPS.

Theory doesn't matter; practice does (3, Interesting)

microbee (682094) | more than 5 years ago | (#27157479)

So, POSIX never guarantees your data is safe unless you do fsync(). So, ext3 was not 100% safer either. So, it's the applications' fault that they truncate files before writing.

But it doesn't matter what POSIX says. It doesn't matter where the fault belongs to. To the users, a system either works nor not, as a whole.

EXT4 aims to replace EXT3 and becomes the next gen de-facto filesystem on Linux desktop. So it has to compete with EXT3 in all regards; not just performance, but data integrity and reliability as well. If in the common scenarios people lose data on EXT4 but not EXT3, the blame is on EXT4. Period.

It's the same thing that a kernel does. You have to COPE with crappy hardware and user applications, because that's your job.

Re:Theory doesn't matter; practice does (1)

MichaelSmith (789609) | more than 5 years ago | (#27157905)

So, POSIX never guarantees your data is safe unless you do fsync().

I always had the impression that closing a file descriptor does an fsync(). Surely if KDE is writing multiple small files it will be closing each file in turn?

Excuses are false. This is a severe flaw. (3, Interesting)

rpp3po (641313) | more than 5 years ago | (#27157485)

There are several excuses circulating: 1. This is not a bug, 2. It's the apps' fault, 3. all modern filesystems are at risk.
This is all a bunch of BS! Delayed writes should lose at most any data between commit and actual write to disk. Ext4 loses the complete files (even their content before the write).
ZFS can do it: it writes the whole transaction to disk or rolls back in case of a crash, so why not ext4? These lame excuses that this is totally "expected" behavior is a shame!

ZFS isn't invulnerable either (0, Redundant)

feld (980784) | more than 5 years ago | (#27157719)

News at 11.

Re:Excuses are false. This is a severe flaw. (2, Informative)

Anonymous Coward | more than 5 years ago | (#27157953)

> Delayed writes should lose at most any data between commit and actual write to disk.

And that's exactly what ext4 does.

Application decides to update some file:
1) Reads the some file
2) Modifies the buffer as needed
3) Truncates the file
4) Writes the buffer to the file

Now, if the filesystem commit happens right between, 3 and 4, the truncation hits the disk, but the new content does not (yet). If a crash happens before the next commit, all what remains is the truncated file.

Bull... (-1, Flamebait)

Duncan3 (10537) | more than 5 years ago | (#27157577)

The file system isn't writing data it claims to have written, that means it's deliberately lying, which is different then a bug.

The "web 2.0" idea that being right most of the time doesn't cut it in the real world. People may put something more important then a tweet on a file system.

Optimize the reads all you want, but those writes better damn well happen before the calls that say data is written return.

I use ntfs (0)

Anonymous Coward | more than 5 years ago | (#27157663)

and I don't know what's going on behind the curtain, nor do I care. I can't recall losing any data in such a manner since, well, ever. even given the fat32 and fat16 days. there was that one time I managed to destroy someone's data with doublespace... anyways, the important thing is that I had an onion tied to my belt, which was the style at the time...

not mounted sync,dirsync? (4, Interesting)

dltaylor (7510) | more than 5 years ago | (#27157693)

When I write data to a file (either through a descriptor or FILE *), I expect it to be stored on media at the earliest practical moment. That doesn't mean "immediately", but 150 seconds is brain-damaged. Regardless of how many reads are pending, writes must be scheduled, at least in proportion of the overall file system activity, or you might as well run on a ramdisk.

While reading/writing a flurry of small files at application startup is sloppy from a system performance point of view, data loss is not the application developers' fault, it's the file system designers'.

BTW, I write drivers for a living, have tuned SMD drive format for performance, and written microkernels, so this is not a developer's rant.

Why SHOULD applications have to assume bad FSs? (1)

nweaver (113078) | more than 5 years ago | (#27157725)

True, posix says that unless you do a fsync(), the file might never be written to disk before the system crashes. But Whiskey-tango-Foxtrot?

Whats wrong with "After a file is closed, its synced to disk"?!?

excuses don't get you world domination (-1, Troll)

Anonymous Coward | more than 5 years ago | (#27157811)

sorry, but declassifying bugs as 'not our responsibility' is something microsoft used to do.

this attitude has always been an issue with open source but it was usually graphics or sound stuff. the idea that this attitude is leaking into critical components like filesystems, that kind of scares me.

Top down reliability? (0)

oneofthose (1309131) | more than 5 years ago | (#27157977)

So is this a new trend to design systems? Make them reliable from top to bottom? Designing an upper-layer part of the system to work around the flaws of a lower layer system component is often necessary but is not the right thing to do it. Telling application developers to change their applications because a new version of the file system breaks their stuff is madness. No matter what POSIX standards say: it worked before, it is broken now: go fix it.
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...