Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Bug GNU is Not Unix Software Linux

GCC 4.3.0 Exposes a Kernel Bug 256

ohxten sends news from earlier this month that GCC 4.3.0's new behavior of not clearing the direction flag before a string operation on x86 systems poses problems with kernels — such as Linux and BSD — that do not clear the direction flag before a signal handler is called, despite the ABI specification.
This discussion has been archived. No new comments can be posted.

GCC 4.3.0 Exposes a Kernel Bug

Comments Filter:
  • Yep, (Score:5, Funny)

    by EkriirkE ( 1075937 ) on Wednesday March 19, 2008 @12:28AM (#22791944) Homepage
    That's what happens when you don't clear that STD...
  • so what (Score:5, Insightful)

    by Brian Gordon ( 987471 ) on Wednesday March 19, 2008 @12:29AM (#22791956)
    OK so the kernel developers add a single line of code, the bugzilla ticket is closed, and we get on to real news?
    • Re:so what (Score:5, Insightful)

      by OverlordQ ( 264228 ) on Wednesday March 19, 2008 @12:37AM (#22791992) Journal
      FTFA:

      This problem has existed for 15 years; GCC has always emitted code that worked correctly on kernels that did not follow the ABI, until now.

      Part of the problem is that there are an enormous number of installed kernels that are vulnerable to this problem, but only if GCC 4.3 is installed.


      That's, quite literally a fuckton of systems. So simply patching new kernels isn't going to make the problem go away.
      • Re:so what (Score:5, Insightful)

        by Creepy Crawler ( 680178 ) on Wednesday March 19, 2008 @12:41AM (#22792024)
        Over-reacting a bit, arent we?

        This bugfix is easily regressed, and has already been done.

        If somebody wants to stick with a buggy kernel, they can use an older version of GCC. It's not like older stable ones put out horrible binary or anything (we need to exempt RH using 2.96, cause that was ages ago).
        • Re:so what (Score:5, Insightful)

          by evanbd ( 210358 ) on Wednesday March 19, 2008 @12:45AM (#22792058)
          Unless, of course, it turns out to be a security hole. The sysadmin installed GCC isn't the only way code gets on to systems. Besides, a lot of packages are shipped as binaries built with modern GCC, whatever that may be. This is going to be a pain to fix, even though the fix is simple.
          • Oh I see the problem.. now that GCC isn't turning out broken binaries, old kernels will be unable to run them. Everyone will be forced to upgrade, or more likely everyone will still make broken binaries.
            • Re: (Score:3, Informative)

              by RupW ( 515653 ) *

              now that GCC isn't turning out broken binaries, old kernels will be unable to run them
              GCC never turned out broken binaries. It turned out overly-conservative binaries that cleared the direction flag even when the ABI spec said it could assume the flag was already clear.
          • Re: (Score:3, Insightful)

            by qbwiz ( 87077 ) *
            Of course, the security holes will only be in programs that were compiled with GCC 4.3.0. It's not as if some unprivileged user could cause problems merely by compiling something with a new version of GCC, but it will still be a problem if a trusted person uses GCC 4.3.0 to compile and run a program which would become exploitable.
      • I'm a consultant, and I'm wondering what the billing rate times a fuckton is going to total out to.
      • Re:so what (Score:4, Funny)

        by serviscope_minor ( 664417 ) on Wednesday March 19, 2008 @01:55AM (#22792416) Journal
        Is a fuckton more or less than a metric assload?
        • Re:so what (Score:5, Funny)

          by xaxa ( 988988 ) on Wednesday March 19, 2008 @08:28AM (#22794064)
          It depends, the US Fuckton is less than a metric assload, but the Imperial Fuckton, previously used in the UK, was more.

          NB The use of 'assload' without the 'metric' qualifier is discouraged, the customary US assload being a much greater mass.
      • Re:so what (Score:5, Interesting)

        by Codifex Maximus ( 639 ) on Wednesday March 19, 2008 @02:49AM (#22792654) Homepage
        Ok, I read the article and alot of the comments.

        Seems to me the easy and correct thing to do would be to use deprecation. i.e. keep the old functionality for a bit longer and also patch or make the new kernels properly set the flag right now. This way, we move in the right direction and when it's no longer an issue then we drop the functionality in the compiler and rely on the kernel setting the flag like it's supposed to do.

        Now, I see why the kernels have not been setting the flag. Why should they when the compiler was doing it? Time to set things right though... in the interests of portability with other environments and compilers. Having the kernels setting the flag starting now would satisfy ABI compatibility with the other compilers AND having gcc continue to cover the flag, by default for a time, would prevent breakage of alot of existing code.

        Seems like a no brainer to me. After all, isn't that what deprecation is for?

        That's my take on it...
      • by HeroreV ( 869368 )

        That's, quite literally a fuckton of systems. So simply patching new kernels isn't going to make the problem go away.

        Other compilers, like ICC from Intel, do not set the flag. That's, quite literally a fuckton of binaries already out in the wild. So simply patching GCC isn't going to make the problem go away either.

        The problem is in the kernel, and GCC cannot solve that. This problem will exist whether GCC adds an ugly hack or not. Even if GCC had never changed their behavior, this would still be a problem for other compilers.

        • Re: (Score:3, Interesting)

          by torstenvl ( 769732 )
          Actually - and I attribute this to good ol' BK - GCC *could* make the problem go away, by recognizing when it is compiling the kernel, and inserting the code itself.

          Just sayin'.

          Read this -- http://cm.bell-labs.com/who/ken/trust.html [bell-labs.com]
        • Re: (Score:3, Informative)

          by petermgreen ( 876956 )
          Well afaict the debian developers plan to modify gcc 4.3 so it behaves in the old way to reduce the risk of crashes when upgrading from one version of debian to the next. Dunno if gcc upstream will agree on that reasoning though. This isn't perfect though, even before gcc's behaviour changed there was still a risk that a signal handler would break the code that it interrupted.

          Afaict this bug only affects a relatively small number of apps because little code messes with the direction flag in the first place
      • by makomk ( 752139 )
        Yeah - I think basically the only OSes that follow the ABI on this are SCO Unix (probably because they wrote the ABI in question) and possibly Solaris. Ones that don't include every single past version of Linux and *BSD (all variants).
    • Re: (Score:2, Interesting)

      OK so the kernel developers add a single line of code, the bugzilla ticket is closed, and we get on to real news?
      p>

      Yes, Probably, a single line of code might fix it. (And I won't even call it a bug.)

      But before getting over this, I want to say kudos to gcc developers who have taken care to warn about this.

  • Kernel bug (Score:5, Funny)

    by Harmonious Botch ( 921977 ) * on Wednesday March 19, 2008 @12:36AM (#22791988) Homepage Journal
    Better than a general fault.
  • by Anonymous Coward on Wednesday March 19, 2008 @12:47AM (#22792076)
    GCC 4.3.0's new behavior of not clearing the direction flag before a string operation on x86 systems poses problems with kernels -- such as Linux and BSD -- that do not clear the direction flag before a signal handler is called, despite the ABI specification.

    Oh my GOD! If this is true, that means- that means-- it... the-

    Uh, what does it mean exactly?
    • by EkriirkE ( 1075937 ) on Wednesday March 19, 2008 @01:00AM (#22792138) Homepage
      When scanning strings for, say, a null terminator the direction flag determines if the current memory register gets incremented or decremented after each byte check. It could mean strlen returns 0 if your strings are grouped together in a segment of memory, or it just plain return the wrong result. Also memory copy routines could copy the wrong part of memory to the wrong place and overwrite executable code (or just cause a page/segment fault).
      • by Anonymous Coward on Wednesday March 19, 2008 @01:13AM (#22792188)
        I'm sorry, I'll need a car analogy on that one.
        • by EkriirkE ( 1075937 ) on Wednesday March 19, 2008 @01:32AM (#22792290) Homepage
          In x86 (assumed from here on) assembly, there are some 'quick' operations to read, write, and test memory (LODS*, STOS*, SCAS* respectively - there are probably more). The CPU has registers, or variables that are counters, or hold the memory addresses in question - in these cases a source memory position and a destination memory position. When you performs these commands the memory registers either increment or decrement value (position) depending on how the direction flag is set. GCC is assuming the flag is clear and the pointers will increment - go forward after each call. If the direction flag is set incorrectly upon calling these string or memory functions, the pointers could go backwards and thus copy (or scan) the wrong chunk of memory to the wrong destination.

          Say our source memory contains:

          Address: 0123456789ABCDEFGHIJKLMNOPQRSTUV
          Contents: XXXXXXXXA car is heavy.-XXXXXXXX


          Let's pretend the hyphen is a null (the string terminator or "stop" in most languages and OS) If I want to perform a strlen on that string at position '8', it should return 15 characters because it found the null at 'N' If the direction flag is wrong, it will not scan 8, 9, A, ... but 8, 7, 6, ... until it finally finds that null or crashes with an access violation.

          And with memory, I want to copy 5 bytes from '8' to position 'P' If that works correctly, we get this in memory:

          Address: 0123456789ABCDEFGHIJKLMNOPQRSTUV
          Contents: XXX-!@#$A car is heavy.-XA carXX


          However, if the direction is wrong, we will get:

          Address: 0123456789ABCDEFGHIJKLMNOPQRSTUV
          Contents: XXX-!@#$A car is heav!@#$AXXXXXX


          See how '8' copied to 'P' as expected, but decrementing we then get '7' to 'O', etc

          We now have corrupt memory. If we so a strlen, strcat or other null-expecting function on that string located at '8' we will see garbage where the memory copy wrote the wrong data to the wrong position. For the nitpicks, this example used per-byte, there are 16, 32, 64 bit variants of the functions that would cause similar problems bit in 2, 4, 8 byte chunks.
          • Re: (Score:2, Informative)

            by EkriirkE ( 1075937 )
            Oops, source memory was supposed to be (better aligned, too):

            Address: 0123456789ABCDEFGHIJKLMNOPQRSTUV
            Content: XXX-!@#$A car is heavy.-XXXXXXXX
          • by dido ( 9125 )

            I wonder if anyone still actually uses the old LODS/STOS/MOVS/CMPS instructions, and these are the only instructions affected by the direction flag. As far as I can tell, on modern x86 systems they are significantly slower than the equivalent multi-instruction versions that read/write/compare via register indirection, i.e. RISC-style code, and they are even slower yet than using MMX or SSE instructions to copy data, if they are available. I don't think that compilers are smart enough to use, say, a MOVSD i

            • by faragon ( 789704 )
              You're right. These instruction became useless -because a lot faster implementation was possible- since the Pentium Pro, being introduced out-of-order execution [wikipedia.org] and enhanced branch prediction. I'm not sure about unrolling can be actually much faster on the original Pentium, but I'm convinced that you could be able to get a notable speed-up, if not in the string scan case, at least in the memcpy case using the FPU for 64-bit transfers.
          • by faragon ( 789704 )
            These "quick operations" are not quick anymore; on modern -out of order- x86 procesors (P4, PentiumM/Core, K7, K8), explicit string search (still without using SIMD tricks) is from 2x to 3x -using SSE2 prefetch- faster than the microprogrammed code, as you can unroll loops without conditional jump penalty.
          • Yup, and another problem is that there are instructions that leave the direction flag undefined, a random value of either 0 or 1. Therefore one has to always explicitly set the direction flag before using it.
        • by Neon Spiral Injector ( 21234 ) on Wednesday March 19, 2008 @01:44AM (#22792364)
          The rules of the road say that you should check that the car is in drive before setting out on your trip. The older version of GCC used to put the car into drive for you. But the new version lets you leave it in reverse if you don't check making you exit out the rear wall of your garage.
          • by RupW ( 515653 ) * on Wednesday March 19, 2008 @06:31AM (#22793474)

            The rules of the road say that you should check that the car is in drive before setting out on your trip. The older version of GCC used to put the car into drive for you. But the new version lets you leave it in reverse if you don't check making you exit out the rear wall of your garage.
            That's not quite right. In this case:
            • the rules of the road say that you can assume you'll find your car in drive
            • the old version of GCC used to always check anyway and put the car in drive for you; the new version just assumes the car is already in drive, because that's what the rules say.
            The problem comes when an affected kernel temporarily hands your car over to a signal handler - let's say "parking valet". The valet now doesn't bother checking the car is in drive when he gets in, because the rules of the road say the kernel should have given him the car in drive. In the past GCC looked over his shoulder to make sure the kernel had really left the car in drive for him. But now no-one bothers checking for him and he might then accidentally crash your car.

        • by SL Baur ( 19540 )

          I'm sorry, I'll need a car analogy on that one.
          It means that you are never sure if your car is in gear or in reverse. So you don't know which direction you will go when step on the gas.
  • What this really exposes is not a bug in any kernel. Indeed, the story states that the "bug" exists in both the BSD and Linux kernels. It really exposes something fascinating about the development process: Code is written based on certain assumptions and a working theory of how the code will function once put into use, but the only way to really know how well it works is to hand it over to the ultimate judge of code correctness--the computer--by running the code. If it works, case closed. Now it's entirely

    • by Alex Belits ( 437 ) * on Wednesday March 19, 2008 @03:27AM (#22792826) Homepage

      It really exposes something fascinating about the development process: Code is written based on certain assumptions and a working theory of how the code will function once put into use, but the only way to really know how well it works is to hand it over to the ultimate judge of code correctness--the computer--by running the code. If it works, case closed.
      Please don't ever again offer your great insight into software development process. If everything was stuffed into the kernel (or other software projects) once it compiles and runs, we would drown in unstable, crashing, insecure, impossible to debug code. Without any doubt, there are plenty of geniuses (some of them in Northwestern US) who develop in this manner, but I can assure you, neither Linux kernel, nor GCC, glibc or other major open source projects use this procedure. If you want to discuss this method further I recommend you to send your opinion to a friendly individual at djb@cr.yp.to .

      Before anything is released, people have to LOOK AT THE CODE and make sure that the source gives them a reason to think, it will run correctly when used with interfaces that it is supposed to utilize or provide. There are plenty of things in the kernel that would require massive amount of testing to be verified with any certainty, so people write usable code not because they are testing it until their hardware breaks but because they know what they are doing.

      Now it's entirely possible that the kernel developers never heard of this obscure nuance of the Intel processor. Then one day, the compiler changed, and with it, the assumptions changed. Mature code that has been declared good years ago seemingly breaks. Now it's easy to blame the code, but really this is a deletion of a feature from the compiler. Nevertheless, it exposes the fact that ultimately, no matter what tools we use and no matter how well we think our code through, you can only consider the code good once it runs and appears to do what it's supposed to.
      What the hell are you talking about?

      Code generated by a C compiler remains consistent regardless of the version, unless you mix binaries built with different versions of GCC. When code that kernel uses to pass control to applications' signal handlers does not keep the direction flag as it is supposed to according to ABI, then userspace code -- ANY CODE THAT CONTAINS SIGNAL HANDLERS -- compiled by a new compiler will not work correctly. In other words, kernel provides an interface that is incompatible with binaries made by a new GCC, and since the standard is on the side of the new GCC behavior, it's kernel that has to be changed. That's all. Nothing else is involved -- some code compiled with a new compiler will not work on an old kernel. Code compiled with an old compiler remains usable with a new kernel, no sources except for five lines in the kernel [lwn.net] have to be changed. It's not even something that a C programmer has any control over unless he writes pieces of his program in assembly -- and then he should know. I don't even believe, any for a C programmer who knows how to write a signal handler it's possible that he "never heard of this obscure nuance of the Intel processor" -- both are very rarely used directly -- however this is completely irrelevant, the only sources that have to be changed are five lines in the kernel, not in signal handlers.

      The only real problem this "exposes" is that for some reason everyone who used x86 SysV ABI for anything that matters (Linux and BSD), decided to change the interface to exclude the requirement to clear the direction flag, even though that "official" standard said otherwise -- however it was known from the very beginning, and this is why older C compiler taken it into account in the first place. It's not a bug or someone's lack of knowledge, it's a violation of a standard, and GCC developers decided to get things back to the letter of a standard because the compiler's optimization benefits from it.
    • by mav[LAG] ( 31387 )
      Now it's entirely possible that the kernel developers never heard of this obscure nuance of the Intel processor.

      Far from being an obscure nuance, CLD and STD are just ordinary instructions which tell the processor which direction the next SCAS, LODS or STOS intruction must go. They are explained very early on in most assembly tutorials that I've come across.

      A kernel developer who's never heard of the processor's direction flag has no business writing kernel code.

  • by Chris Pimlott ( 16212 ) on Wednesday March 19, 2008 @01:28AM (#22792264)
    This article is not yet public for non-subscribers. The link given is supposed to be for a subscriber to forward to a friend; putting it up on Slashdot goes against the intended spirit and does not help support Linux Weekly News, which deserves the community's support.
    • Re: (Score:3, Insightful)

      Alternatively it's a good way to get additional exposure for LWN, as clearly this article is of some value. Maybe 0.0001% of slashdot readers will subscribe because of this.

      Besides, we're all friends here, aren't we?

    • by Corbet ( 5379 ) on Wednesday March 19, 2008 @09:45AM (#22794684) Homepage
      FWIW, I originally posted the subscriber link in question to reddit yesterday. I'm surprised to see it show up here, but I also don't mind that it has happened. I'd just as soon not see all LWN content on Slashdot as subscriber links (Slashdot readers probably agree), but this one has brought some attention and, I think, some subscribers. And that's where LWN content comes from in the first place.
  • History repeating (Score:3, Informative)

    by Brett Johnson ( 649584 ) on Wednesday March 19, 2008 @02:50AM (#22792658)
    I seem to recall the MS-DOS 2.x suffered this same problem with either the Int 21 or Int 13 interfaces. (Hey it was 20 years ago, I don't remember the details.) If you made certain BDOS calls with the direction flag set, the message "A evird rorre etirw daeR" ("Read write error drive A" backwards) would be displayed on the console. It wasn't fixed for years. I remember we rigorously enforced the "Clear the direction flag before calling into MS-DOS" rule.

  • That means that all other compilers behave like the old GCCs in this case. Otherwise they would have exposed this bug already. So GCCs new behaviour could be seen as either non-standard or "innovative".
    • Nope, other compilers always (?) did it this way - at least according to TFA.
      (There is a list of 'other compilers' in there somewhere)
  • Debian, RedHat et al aren't going to release new packages compiled with GCC 4.3.0 for every damn binary. Instead, they'll hold back on providing an update to GCC and they won't compile any updated packages with the updated GCC until the next major release.

    Of course, that's not very helpful if you depend on closed-source software and the vendor won't tell you what compiler they use. Neither is it particularly helpful if you run Gentoo (which sooner or later will expect you to upgrade compiler) or if you're
    • They don't need to. All they need to do is release an updated kernel.
    • by WK2 ( 1072560 )
      True. Major distros will hold back on upgrading to gcc 4.3.0. Unless they already upgraded. For the most part, this bug will only cause headaches (and possibly suicides) to people trying to diagnose issues in their code, either because they didn't get the memo, and are using gcc 4.3.0, or because they are helping someone with run-time issues, who are using gcc 4.3.0. If I remember correctly, we had similar problems with gcc 4.0.x. I don't recall any reported deaths.
  • Most experienced assembler programmers know better than to assume the direction flag will be set or cleared unless this is specifically documented.
    • by RupW ( 515653 ) *

      Most experienced assembler programmers know better than to assume the direction flag will be set or cleared unless this is specifically documented.
      That's the whole point - it *is* explicitly documented but the old GCC used to explicitly clear it anyway. The new GCC assume everyone's following the documentation and doesn't bother with the extra clear.
  • by flyingfsck ( 986395 ) on Wednesday March 19, 2008 @05:19AM (#22793182)
    I fixed this bug in 1989 in an Intel C compiler. That was some years before the GCC project was started. Some people never learn...
    • Re: (Score:3, Funny)

      by X3J11 ( 791922 )

      I fixed this bug in 1989 in an Intel C compiler. That was some years before the GCC project was started. Some people never learn...

      From http://en.wikipedia.org/wiki/GNU_Compiler_Collection [wikipedia.org]:

      Originally named the GNU C Compiler, because it only handled the C programming language, GCC 1.0 was released in 1987, and the compiler was extended to compile C++ in December of that year.

      Perhaps the error in your assertion is a side effect of an uncleared direction flag.

  • Assembler code (Score:2, Interesting)

    by hemanhedman ( 84515 )
    Does this mean that you could hand-craft some assembler code that exploits virtually all Linux and BSD-kernels out there?
  • OMG OMG OMG! My kernel is vulnerable!!

    - regs->flags &= ~(X86_EFLAGS_TF);
    + regs->flags &= ~(X86_EFLAGS_TF | X86_EFLAGS_DF);

    make

    done.
  • Oh No (Score:3, Funny)

    by fluffykitty1234 ( 1005053 ) on Wednesday March 19, 2008 @11:40AM (#22796074)
    I just heard that this has seriously set back the release date of Duke Nukem Forever!

A morsel of genuine history is a thing so rare as to be always valuable. -- Thomas Jefferson

Working...