Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

How Your Compiler Can Compromise Application Security

Soulskill posted about a year ago | from the my-compiler-levels-me-out dept.

Programming 470

jfruh writes "Most day-to-day programmers have only a general idea of how compilers transform human-readable code into the machine language that actually powers computers. In an attempt to streamline applications, many compilers actually remove code that it perceives to be undefined or unstable — and, as a research group at MIT has found, in doing so can make applications less secure. The good news is the researchers have developed a model and a static checker for identifying unstable code. Their checker is called STACK, and it currently works for checking C/C++ code. The idea is that it will warn programmers about unstable code in their applications, so they can fix it, rather than have the compiler simply leave it out. They also hope it will encourage compiler writers to rethink how they can optimize code in more secure ways. STACK was run against a number of systems written in C/C++ and it found 160 new bugs in the systems tested, including the Linux kernel (32 bugs found), Mozilla (3), Postgres (9) and Python (5). They also found that, of the 8,575 packages in the Debian Wheezy archive that contained C/C++ code, STACK detected at least one instance of unstable code in 3,471 of them, which, as the researchers write (PDF), 'suggests that unstable code is a widespread problem.'"

cancel ×

470 comments

Sorry! There are no comments related to the filter you selected.

News flash (1)

digitalPhant0m (1424687) | about a year ago | (#45274567)

Humans write unstable code.

Re:News flash (4, Informative)

war4peace (1628283) | about a year ago | (#45274687)

I would also like to understand what's the definition of "unstable code".

Re:News flash (5, Funny)

Mitchell314 (1576581) | about a year ago | (#45274775)

Code with a finite half-life. Sometimes radiates when it decays. The byproducts tend to be hazardous to health, and most cause symptoms such as headaches, tremors, Carpal Tunnel Syndrome, and Acute Induced Tourette Syndrome. Handle with care. The Daily WTF has an emergency hotline if you or somebody you know has been exposed to unsafe levels of unstable code.

Re:News flash (4, Funny)

Cryacin (657549) | about a year ago | (#45274811)

So that's why you have to restart your computer. Gets rid of dangerous radiation from weapons grade baloneyum decay.

Re:News flash (1)

Anonymous Coward | about a year ago | (#45274791)

Code that, just when you're relying on it the most, bursts into tears, slams the front door, and runs away to its mother for a week.

Re:News flash (1)

EvanED (569694) | about a year ago | (#45275029)

Didn't RTFA because this is /., but I'd guess that it's code that works now but is fragile under a change of compiler, compiler version, optimization level, or platform.

Re:News flash (2)

foobar bazbot (3352433) | about a year ago | (#45275391)

I would also like to understand what's the definition of "unstable code".

Unstable code is code such that, when you make an arbitrarily small change, you end up rewriting the entire thing.

Stable code, by contrast, is code such that when you make an arbitrarily small change, the code ends up being restored to its original state, or perhaps engaging in a bounded oscillation, where you and another coder keep changing it back and forth with every release.

Film at 11 (0)

Anonymous Coward | about a year ago | (#45274577)

Running static analysis tools on a whole repository gives lots of warnings.
Who'da thunk it?

TFA does a poor job of defining what's happening (4, Insightful)

istartedi (132515) | about a year ago | (#45274579)

If my C code contains *foo=2, the compiler can't just leave that out. If my code contains if (foo) { *foo=2 } else { return EDUFUS; } it can verify that my code is checking for NULL pointers. That's nice; but the questions remain:

What is "unstable code" and how can a compiler leave it out? If the compiler can leave it out, it's unreachable code and/or code that is devoid of semantics. No sane compiler can alter the semantics of your code, at least no compiler I would want to use. I'd rather set -Wall and get a warning.

Null pointer detection at compile time (1)

tepples (727027) | about a year ago | (#45274623)

I'd rather set -Wall and get a warning.

There are some undefined behaviors that can't be detected so easily at compile time, at least not without a big pile of extensions to the C language. For example, if a pointer is passed to a function, is the function allowed to dereference it without first checking it for NULL? The Rust language doesn't allow assignment of NULL to a pointer variable unless it's declared as an "option type" (Rust's term for a value that can be a pointer or None).

Re:Null pointer detection at compile time (4, Insightful)

Zero__Kelvin (151819) | about a year ago | (#45274757)

"For example, if a pointer is passed to a function, is the function allowed to dereference it without first checking it for NULL?"

Of course it is, and it is supposed to be able to do so. If you were an embedded systems programmer you would know that, and also know why. Next you'll be complaining that languages allow infinite loops (again, a very useful thing to be able to do). C doesn't protect the programmer from himself, and that's by design. Compilers have switches for a reason. If they don't know how it is being built or what the purpose of the code is then they can't possibly determine with another program if the code is "unstable".

Re:TFA does a poor job of defining what's happenin (5, Informative)

Anonymous Coward | about a year ago | (#45274679)

An example of "unstable code":

char *a = malloc(sizeof(char));
*a = 5;
char *b = realloc(a, sizeof(char));
*b = 2;
if (a == b && *a != *b)
{
        launchMissiles();
}

A cursory glance at this code suggests missiles will not be launched. With gcc, that's probably true at the moment. With clang, as I understand it, this is not true -missiles will be launched. The reason for this is that the spec says that the first argument of realloc becomes invalid after the call, therefore any use of that pointer has undefined behaviour. Clang takes advantage of this, and defines the behaviour of this to be that *a will not change after that point. Therefore it optimises if (a == b && *a != *b) into if (a == b && 5 != *b). This clearly then passes, and missiles get launched.

The truth here is that your compiler is not compromising application security – the code that relies on undefined behaviours is.

Re:TFA does a poor job of defining what's happenin (5, Informative)

dgatwood (11270) | about a year ago | (#45274827)

Another, more common example of code optimizations causing security problems is this pattern:

int a = [some value obtained externally];
int b = a + 2;
if (b < a) {
// integer overflow occurred ...
}

The C spec says that signed integer overflow is undefined. If a compiler does no optimization, this works. However, it is technically legal for the compiler to rightfully conclude that two more than any number is always larger than that number, and optimize out the entire "if" statement and everything inside it.

For proper safety, you must write this as:

int a = [some value obtained externally];
if (INT_MAX - a < 2) {
// integer overflow will occur ...
}
int b = a + 2;

Re:TFA does a poor job of defining what's happenin (-1)

Anonymous Coward | about a year ago | (#45275047)

The first mistake was using signed integers. unsigned integers always have well-defined overflow (modulo semantics), which means it's easier to construct safe conditionals. In your example, if a is negative it's undefined behavior, although it'll probably work on twos-complement machines because compilers can't weasel their way out of the expression you used

Signed arithmetic is for mathematical formula, not general-purpose programming. We _always_ have to worry about overflow in general purpose programming because we're always working with limited width types. It's far easier to reason about unsigned types because, again, they have modulo semantics. Those are the semantics people always depend on when they write bad signed expressions, anyhow. And I can't remember the last time I actually needed negative values when parsing, doing I/O, or doing any of the other 99% of the things I program. You'll know when you need signed arithmetic; the rest of the time, just default to unsigned.

Re:TFA does a poor job of defining what's happenin (0)

Anonymous Coward | about a year ago | (#45275085)

Gee, I wish I programmed all the things you program! Then I'd NEVER need anything but unsigned integers! Ah, there but for the grace of God...

(In case you missed the subtext: prick. You aren't everyone and the 99% of the things *you* program probably form the 1% of the things that *I* program, and that does not make what I program worthless any more than it makes what you program worthless... unlike you, since you're an arrogant piece of shite.)

Re:TFA does a poor job of defining what's happenin (2, Informative)

EvanED (569694) | about a year ago | (#45275199)

The first mistake was using signed integers. unsigned integers always have well-defined overflow (modulo semantics), which means it's easier to construct safe conditionals

Not in C and C++ they don't. The compiler is allowed to perform that optimization with either signed or unsigned integers.

Re:TFA does a poor job of defining what's happenin (1)

EvanED (569694) | about a year ago | (#45275273)

Not in C and C++ they don't. The compiler is allowed to perform that optimization with either signed or unsigned integers.

I take back this statement... it is not correct, at least in C99.

Re:TFA does a poor job of defining what's happenin (1)

lgw (121541) | about a year ago | (#45275255)

Under C99 all machines must be both 2s-compliment and have 8-bit bytes. IIRC both fall out from inttypes.h. Word is this wasn't intentional, but it had been so long since anyone actually used other architectures that no one noticed that implication.

Re:TFA does a poor job of defining what's happenin (1)

seebs (15766) | about a year ago | (#45275383)

This doesn't sound right to me. The intX_t types, if present, have to be more 2s-complimenty, but they aren't really required to be present, as I recall.

Re:TFA does a poor job of defining what's happenin (1)

Athrac (931987) | about a year ago | (#45275387)

Under C99 all machines must be both 2s-compliment and have 8-bit bytes. IIRC both fall out from inttypes.h. Word is this wasn't intentional, but it had been so long since anyone actually used other architectures that no one noticed that implication.

You are incorrect. C99 (and C11) still explicitly allow two's complement, one's complement and sign-and-magnitude repsesentation for signed types. You are probably confusing it with the type definitions int8_t, int16_t etc. which ARE required to be two's complement (if they exist). But the standard does not require those type definitions to exist.

Re:TFA does a poor job of defining what's happenin (0)

Anonymous Coward | about a year ago | (#45275261)

Erm, I have to deal with negative numbers on a constant basis.

Re:TFA does a poor job of defining what's happenin (0)

Anonymous Coward | about a year ago | (#45275351)

Who are you, a government accountant?

Re:TFA does a poor job of defining what's happenin (0)

Anonymous Coward | about a year ago | (#45275227)

However, it is technically legal for the compiler to rightfully conclude that two more than any number is always larger than that number, and optimize out the entire "if" statement and everything inside it.

It's a good deal worse than that. The compiler is allowed to do ANYTHING. It can replace the code inside the if with code that sends all your customer data to your competitor. It can install a virus. Anything.

Re:TFA does a poor job of defining what's happenin (2)

Cryacin (657549) | about a year ago | (#45274829)

YOU SUNK MY BATTLESHIP!

Re:TFA does a poor job of defining what's happenin (0)

Spazmania (174582) | about a year ago | (#45275121)

That code is bad for many reasons, not the least of which is that it's semantically ambiguous whether the result of malloc() should be assigned to a or *a.

However, the compiler here clearly can't make any valid assumptions about the contents of *a following the realloc. That's what undefined means: it holds a value about which you can't make any assumptions. Because the behavior is undefined, no *valid* optimization is possible.

Clang is wrong. If it's smart enough to recognize the undefined behavior then it should (a) warn the user and (b) make no optimization attempts to any code which later references *a.

Re:TFA does a poor job of defining what's happenin (5, Funny)

lgw (121541) | about a year ago | (#45275291)

No, the compiler is allowed to to anything it damn well pleases wherever the standard calls behaviou "undefined". One of my favorite quotes ever from a standards discussion:

When the compiler encounters [a given undefined construct] it is legal for it to make demons fly out of your nose

Nasal demons can cause code instability.

Re:TFA does a poor job of defining what's happenin (2)

Spazmania (174582) | about a year ago | (#45275385)

If I tell the compiler to give me warnings, it detects a code whose behavior is undefined in the standard, but then fails to issue a warning then the compiler is broken. If it goes on to make a fancy assumption about the undefined behavior instead of letting it fall through to runtime as written then it's doubly broken.

Re:TFA does a poor job of defining what's happenin (0)

mysidia (191772) | about a year ago | (#45275197)

A cursory glance at this code suggests missiles will not be launched. With gcc, that's probably true at the moment. With clang, as I understand it, this is not true -missiles will be launched.

It's not quite correct. a == b is not a use of the argument that has been invalidated. a was a variable containing an address of the object that was passed by value to the realloc() function.

In case the value of a is no longer valid, then the b = realloc ... assignment, would not have returned the same value; therefore, a == b would evaluate to false, and with the short-circuit && operator, the *a != *b test would never have been executed.

Re:TFA does a poor job of defining what's happenin (0)

istartedi (132515) | about a year ago | (#45275221)

A cursory glance at this code suggests missiles will not be launched.

That's funny. My first takeaway is that the programmer is assuming malloc never fails. Let's get past that and assume that malloc and realloc both returned something. Most of us would assume it's unusual for realloc to do anything. We expect a==b to be true which makes (*a!=*b) impossible and the body of the if-block unreachable. So. I'm with you so far.

With clang, as I understand it, this is not true -missiles will be launched. The reason for this is that the spec says that the first argument of realloc becomes invalid after the call, therefore any use of that pointer has undefined behaviour. Clang takes advantage of this, and defines the behaviour of this to be that *a will not change after that point.

OK, if the spec says that a is undefined after the call to realloc, then IMHO the compiler should change the type of a from char * to UNDEFINED and complain. Based on what you're saying, it sounds like Clang is wrong. It sounds like they're treating undefined behavior as implementation defined behavior.

I'm sure somebody will correct me if I'm wrong on that one.

OK, before somebody else points it out... (4, Interesting)

istartedi (132515) | about a year ago | (#45275347)

My statement is contradictory. I recommended a course of action for undefined behavior, while maintaining that Clang is wrong for documenting a course of action for undefined behavior.

My understanding of "undefined behavior" in the C spec is that it means "anything can happen and the programmer shouldn't rely on what the compiler currently does". Of course, in the real world *something* must happen. If a 3rd party documents what that something is, the compiler is still compliant. It's the programmer's fault for relying on it.

OTOH, if the behavior was "implementation defined" then the compiler authors can define it. If they change their definition from one rev to another without documenting the change, then it's the compiler author's fault for not documenting it.

In other words:

undefined -- programmer's fault for relying on it.
implemenation defined -- compiler's fault for not documenting it.

Re:OK, before somebody else points it out... (1)

EvanED (569694) | about a year ago | (#45275379)

I was already responding, but yes, your summary sounds pretty much perfect.

Re:TFA does a poor job of defining what's happenin (1)

EvanED (569694) | about a year ago | (#45275367)

OK, if the spec says that a is undefined after the call to realloc, then IMHO the compiler should change the type of a from char * to UNDEFINED and complain. Based on what you're saying, it sounds like Clang is wrong. It sounds like they're treating undefined behavior as implementation defined behavior.

I'm sure somebody will correct me if I'm wrong on that one.

You're wrong on that one. :-)

First, let's start with this specific case. First of all, the type of a variable can't "change", because the type of a variable in languages without type-state sutff is static. (Aside: this is a useful way to think about the distinction between statically typed languages and dynamically typed ones -- in statically typed languages, variables have types, while in dynamically typed languages, values have "types.") In this case it's pretty easy to see how the compiler can deal with it, but in general it's not:

char * b;
if (big_fancy_condition) {
    b = realloc(a, ...)
}
....
if (another_big_fancy_condition) {
... *b ...;
}

Can that program provoke undefined behavior? Depends on the conditions, which means it's undecidable in general. In the type viewpoint, what's the type during the ellipsis? Is it char* or is it inaccessible? It's char* down one branch but inaccessible down the other, and there's not a fully-general way to merge those two types (in a way that type-checking is still decidable).

Second, undefined behavior means the compiler is allowed to do anything -- it's less restrictive than implementation-defined. For implementation-defined behavior, the compiler needs to make a choice, stick with it (at least with a consistent set of compiler settings), and document it. For undefined behavior, the compiler can do anything it wants to, for any reason it wants to, can do different things in the same situation in different places because why the hell not, etc. -- the standard imposes no restrictions on what happens once undefined-behavior is triggered. See here [stackoverflow.com] for more.

Re:TFA does a poor job of defining what's happenin (5, Informative)

Nanoda (591299) | about a year ago | (#45274729)

What is "unstable code" and how can a compiler leave it out?

The article is actually using that as an abbreviation for what they're calling "optimization-unstable code", or code that is included at some specified compiler optimization levels, but discarded at higher levels. Basically they think it's unstable due to being included or not randomly, not because the code itself necessarily results in random behaviour.

Re:TFA does a poor job of defining what's happenin (-1)

Anonymous Coward | about a year ago | (#45274755)

TFA mentions undefined code. Basically, fucked up computer languages allow "undefined code", ie. C / C++.
If you programmed in the 90s, such code could give you a compiler warning, or a nasty surprise at runtime, plus differences across compilers, even across different versions of the same compiler.
Today, it seems some compilers simply remove the code, since it's anyways "undefined".
So what may seem in code to be a security check, or any other valid functionality, might just instead be silently removed along with all implicated parts derived from the undefinedness.
Since it's really "undefined", the compiler might as well encrypt your entire HDD and ransom you for digital currency. It's "undefined".
It's also a great way to sneak in stealth backdoors and functionality.

What this shouldn't tell you, is how cool it is to detect elements of nasty surprises in inferior programming languages.

What this should tell you, is what programming languages you should run away from screaming.
Or at the very least, bring your warning-levels to the max in such languages, and only bring people who truly know what they're doing, on all levels of detail.

Captcha: discover

Re:TFA does a poor job of defining what's happenin (0)

Anonymous Coward | about a year ago | (#45274925)

Er... so your conclusion is that we should all "run screaming" from C/C++ and hire a bunch of people who "truly know what they're doing... on all levels of detail" with C/C++, who are liable to be a relatively limited number and command top wages.

Top plan! One would almost think you view yourself as one of those people who "truly known what they're doing"! Even "on all levels of detail!"

The third alternative is "Elucidate how compilers are (rightfully, according to the standard) introducing potential dangers, and educate people accordingly", on whatever level you feel that involves. Alas, that alternative never occurred to you. Which, given the kind of abstractions a paper I read recently pointed out -- where Clang and G++ acted quite differently and equally dangerously -- is a bit of a pity since I strongly suspect you'd probably be caught out by an unexpected trap too.

(If you think you wouldn't you've never developed professionally for a living. We all have, and no-one should be ashamed to say they have.)

Re:TFA does a poor job of defining what's happenin (2)

lgw (121541) | about a year ago | (#45275395)

, fucked up computer languages allow "undefined code", ie. C / C++.

Every language has some undefined behavior (and there are libraries with undefined behavior in every language), except maybe ADA.

Java leaves a wide area undefined when it comes to multi-threaded code.

Python has the same, plus it inherits some undefined behaviors from C.

C/C++ leaves a wide are undefined to support oddball system architectures. For example, if you have some memory that only can store floating point numbers, and some general-purpose memory, the address ranges might overlap - that's why pointer subtraction is undefined unless within an array. In practice most programmers can treat all memory as one contiguous byte array, but on special-purpose hardware you can still use C. Most of C's undefined behavior comes from the much wider variety of system architectures when C was young, but can still be useful for embedded systems.

Re:TFA does a poor job of defining what's happenin (4, Informative)

Spikeles (972972) | about a year ago | (#45274849)

The TFA links to the actual paper. Maybe you should read that.

Towards Optimization-Safe Systems:Analyzing the Impact of Undefined Behavior [mit.edu]

struct tun_struct *tun = ...;
struct sock *sk = tun->sk;
if (!tun)
return POLLERR; /* write to address based on tun */

For example, when gcc first sees the dereference tun->sk, it concludes that the pointer tun must be non-null, because the C standard states that dereferencing a null pointer is undefined [24:6.5.3]. Since tun is non-null, gcc further determines that the null pointer check is unnecessary and eliminates the check, making a privilege escalation exploit possible that would not otherwise be.

Re:TFA does a poor job of defining what's happenin (4, Informative)

complete loony (663508) | about a year ago | (#45274851)

"What every C programmer should know about undefined behaviour" (part 3 [llvm.org] , see links for first 2 parts).

For example, overflows of unsigned values is undefined behaviour in the C standard. Compilers can make decisions like using an instruction that traps on overflow if it would execute faster, or if that is the only operator available. Since overflowing might trap, and thus cause undefined behaviour, the compiler may assume that the programmer didn't intend for that to ever happen. Therefore this test will always evaluate to true, this code block is dead and can be eliminated.

This is why there are a number of compilation optimisations that gcc can perform, but which are disabled when building the linux kernel. With those optimisations, almost every memory address overflow test would be eliminated.

Re:TFA does a poor job of defining what's happenin (1)

istartedi (132515) | about a year ago | (#45275093)

For example, overflows of unsigned values is undefined behaviour in the C standard.

I'm glad I didn't know that when I used to play with software 3d engines back in the 90s. 16-bit unsigned integer "wrap around" was what made my textures tile. I do seem to vaguely recall that there was a compiler flag for disabling integer traps and that I disabled it. It was Microsoft's C compiler, and it's been a loooooong time.

OK, I'm looking through the options on the 2005 free Visual Studio... I can find a flag to disable floating point traps, but not integer. Maybe the full version lets you do that. I used to have the full version. I suppose if it were really important I could track down the magic assembly voodoo incantation to do it. I'm guessing the MS disables integer overflow traps by default...

Re:TFA does a poor job of defining what's happenin (1)

assassinator42 (844848) | about a year ago | (#45275159)

Overflows of unsigned values are well-defined in C (they wrap). (Technically the standard says unsigned values can't overflow because they're wrapped)
Overflows of signed values are undefined.

Re:TFA does a poor job of defining what's happenin (1)

mysidia (191772) | about a year ago | (#45275145)

I'd rather set -Wall and get a warning.

I see your -Wall, and raise you a -Werror -pedantic

Re:TFA does a poor job of defining what's happenin (1)

tricorn (199664) | about a year ago | (#45275271)

I once had some code that confused me when the compiler optimized some stuff out.

I had a macro that expanded to a parenthesized expression with several sub-expressions separated by commas that used a temp variable, e.g.:

#define m(a) (tmp = a, f(tmp) + g(tmp))

because the argument (a) could be an expression with side effects.

Now, I knew that the order of evaluation of function arguments wasn't defined, but I never read that as meaning that a compiler could optimize away parts of a function call such as: x(m(1), m(2)); this particular compiler effectively acted as if it was evaluating both arguments in parallel, thus the value of tmp was undefined throughout (I think it eliminated one of the initial assignments).

Changing it to an in-line function made it work; it had initially been code written for a compiler that didn't have in-line functions and was in the middle of a very tight loop.

Re:TFA does a poor job of defining what's happenin (0)

Anonymous Coward | about a year ago | (#45275281)

Compilers are free to assume that the code does not contain undefined behaviour. This allows for better optimization. But things can get tricky. To give an example:
    int my_divide(int a, int b) {
            if (!b) diediedie("oh noes");
            return a / b;
    }

An overzealous optimization may move the division up (it's a high-latency instruction) all the way ahead of the diediedie call.
It'd be legal if diediedie is assured to return. The compiler has erroneousely assumed lack of noreturn attribute constitutes some guarantee here.
As for altering the semantics, the standard has a notion of abstract semantics and actual semantics, which need to agree at certain points. In other words, optimization and magic are allowed if the result is right. The compiler may substitute
    for (i = 0; i != 3; i++) printf("%d", i);
With
    fputs("012", stdout);

Special precautions are needed when you e.g. run a benchmark that is a big nop in essence.

"Unstable" code? WTF? (1)

Anonymous Coward | about a year ago | (#45274591)

"Unstable" code is not a technical term used by any self-respecting programmer. Researchers love to make up terms that nobody but themselves use. Props to the MIT News article [mit.edu] for correctly avoiding that term.

Re:"Unstable" code? WTF? (1)

viperidaenz (2515578) | about a year ago | (#45274717)

The MIT article also incorrectly refers to Java. Which isn't compile-time optimized.

Re:"Unstable" code? WTF? (1)

Tablizer (95088) | about a year ago | (#45274821)

"Unstable" code is not a technical term used by any self-respecting programmer.

I believe "fucked-up steaming pile of rotting garbage" is the proper term.

Re:"Unstable" code? WTF? (0)

Lehk228 (705449) | about a year ago | (#45275173)

I take it you have not been introduced to Microsoft Windows?

How is this news? (0)

Anonymous Coward | about a year ago | (#45274605)

Compilers have been ignoring meatspace problems for years. It's well known that most compilers will both ignore some bad chunks of code as well as do its own optimizations (like unrolling).

If the binaries it compiles to work as intended and pass validation, what's the issue? The compiler being a point of trust is something that's been rehashed constantly with people continually reposting the 70 year old ken article

Inflammatory Subject (4, Informative)

Imagix (695350) | about a year ago | (#45274613)

This is complaining because code which is already broken is broken more by the compiler? The programmer is already causing unpredictable things to happen, so even "leaving the code in" still provides no assurances of correct behaviour. An example of how the article is skewed:

Since C/C++ is fairly liberal about allowing undefined behavior

No, it's not. The language forbids undefined behavior. If your program invokes undefined behavior, it is no longer well-formed C or C++.

Re:Inflammatory Subject (2)

Murdoch5 (1563847) | about a year ago | (#45274751)

You're right, there should never be undefined behavior or clueless development. If things are getting compiled out of the code then you clearly don't know enough about the compiler and language. I love when developers blame things like pointers and memory faults instead of the misuse of these by bad programming.

Re:Inflammatory Subject (1)

HiThere (15173) | about a year ago | (#45274899)

That's nice. But when a language invites such things, that *is* a flaw in the language. I basically distrust pointers, but especially any pointers on which the user does arithmetic. Some people think that's a snazzy way to move through an array. I consider it recklessly dangerous stupidity, which is leaving you wide open to an undetected error with a simple typo.

Re:Inflammatory Subject (1)

Murdoch5 (1563847) | about a year ago | (#45275295)

You can't blame a language for flaws when you decide to use the features you consider dangerous. Pointers are one of the most powerful features of C and if you know how to use them correctly and safety they will be very very powerful. Just because a pointer can completely grable memory and completely corrupt your stack and heap doesn't mean they will. C and ASM assume the programmer is smart enough to take memory management into there own hands and personally I completely agree. I hate all forms of automatic management and garbage collection, they both don't work nearly as well as a skilled programmer with a good knowledge of pointers.

Re:Inflammatory Subject (0)

Anonymous Coward | about a year ago | (#45274779)

C/C++ has a lot of things which are left to the implementation. It doesn't forbid them. It simply says that it is implementation specific. See: the size of various data types, and other other things.

Re:Inflammatory Subject (0)

Anonymous Coward | about a year ago | (#45274987)

C++ does not forbid undefined behaviour. This is pretty easy to see if you look at features like "new". Some compilers will initialize memory allocated by new, others will not. It is not safe to assume we know the contents of memory returned by new as it can vary depending on compiler/platform.

Re:Inflammatory Subject (0)

Anonymous Coward | about a year ago | (#45275079)

There is a sense in which the article is correct. C and C++ are fairly liberal in that a conforming implementation (e.g. compiler) is not required to reject such code. So although the specifications imply that certain programs are invalid, they permit every implementation to accept all such programs.

That in itself isn't all bad, since one might imagine designing a system that rejects all such programs. However: in these languages almost any line of code might or might not invoke undefined behaviour depending upon preconditions set up by arbitrarily distant code, implementation dependent details, or external input. The general problem of testing for undefined behaviour in these languages is uncomputable. Any compiler will therefore always have either false positives (spurious warnings or errors) or false negatives (invalid programs compiling without warning).

The only general way to determine whether a C or C++ program invokes undefined behaviour is by doing something equivalent to running it.

Need new compiler features (1)

valderost (668593) | about a year ago | (#45274671)

Compilers ought to have switches that deliberately branch to the error cases they're trying to optimize away. Getting rid of a divide by zero? Force the error instead so it gets attention. Coder forgot to declare volatile variables? Make local static shadow copies of static variables for comparison at every reference. And so on. Development environments ought to be helping with this stuff, not confounding developers.

Re:Need new compiler features (2)

MRe_nl (306212) | about a year ago | (#45274841)

"But in our enthusiasm, we could not resist a radical overhaul of the system, in which all of its major weaknesses have been exposed, analyzed, and replaced with new weaknesses".

Bruce Leverett, Register Allocation in Optimizing Compilers

Re:Need new compiler features (1)

donaldm (919619) | about a year ago | (#45275201)

Compilers ought to have switches that deliberately branch to the error cases they're trying to optimize away. Getting rid of a divide by zero? Force the error instead so it gets attention.

Why? Isn't that that job of the programmer nor the actual compiler.

Sure you can produce a program that has a divide by zero event and it can compile without errors, but when you run the binary you would get (C example): "Floating point exception (core dumped)". Most programmers upon seeing this should realise they have stuffed up and should correct their code accordingly. In fact any programmer should always have conditionals to test any input data to make sure that data falls within specified bounds.

x86 memory model is to blame? (1)

codeusirae (3036835) | about a year ago | (#45274703)

"To understand unstable code, consider the pointer overflow check buf + len | buf shown in Figure 1 .. While this check appears to work with a flat address space, it fails on a segmented architecture" ref [mit.edu]

Do you think most-all exploits are down to the defective x86 segmented memory architecture.

Re:x86 memory model is to blame? (0)

Anonymous Coward | about a year ago | (#45274743)

"Do you think most-all exploits are down programmers not working around the defective x86 segmented memory architecture."

FTFY. Also, no, no I don't. But some of it probably is, yes, for a given value of "defective".

A hint for the future: avoiding overly emotionally-loaded terms such as "defective" would probably make your argument sound both more reasoned and more powerful.

Re:x86 memory model is to blame? (2)

Myria (562655) | about a year ago | (#45274771)

Do you think most-all exploits are down to the defective x86 segmented memory architecture.

I think those who coded for the SNES or Apple IIGS in C would disagree with blaming the x86 exclusively =)

Do compilers really remove this? (3, Interesting)

Todd Knarr (15451) | about a year ago | (#45274709)

I haven't heard of any compiler that removes code just because it contains undefined behavior. All compilers I know of leave it in, and whether it misbehaves at run-time or not is... well, undefined. It may work just fine, eg. dereferencing a null pointer may just give you a block of zeroed-out read-only memory and what happens next depends on what you try to do with the dereferenced object. It may immediately crash with a memory access exception. Or it may cause all mounted filesystems to wipe and reformat themselves. But the code's still in the executable. I know compilers remove code that they've determined can't be executed, or where they've determined that the end state doesn't depend on the execution of the code, and that can cause program malfunctions (or sometimes cause programs to fail to malfunction, eg. an infinite loop in the code that didn't go into an infinite loop when the program ran because the compiler'd determined the code had no side-effects so it elided the entire loop).

I'd also note that I don't know any software developers who use the term "unstable code" as a technical term. That's a term used for plain old buggy code that doesn't behave consistently. And compilers are just fine with that kind of code, otherwise I wouldn't spend so much time tracking down and eradicating those bugs.

Re:Do compilers really remove this? (1)

queazocotal (915608) | about a year ago | (#45274803)

'I haven't heard of any compiler that removes code just because it contains undefined behavior.'
Then your code may not be doing what you think it is.
GCC, Clang, acc, armcc, icc, msvc, open64, pathcc, suncc, ti, windriver, xlc all do this.

Click on the PDF, and scroll to page 4 for a nice table of optimisations vs compiler and optimisation level.

_All_ modern compilers do this as part of optimisation.

GCC 4.2.1 for example, with -o0 (least optimisation) will eliminate if(p+100p)

C however says that an overflowed pointer is undefined, and this means the compiler is free to assume that it never occurs.

Yes compilers really do this (3, Informative)

Anonymous Coward | about a year ago | (#45274825)

Yes it leads to real bugs - Brad Spengler uncovered one of these issues in the Linux kernel in 2009 [lwn.net] and it led to the kernel using the -fno-delete-null-pointer-checks gcc flag to disable the spec correct "optimisation".

Re:Do compilers really remove this? (1)

gnasher719 (869701) | about a year ago | (#45274857)

The compiler doesn't leave out code with undefined behaviour - it assumes that there is no undefined behaviour, and draws conclusions from this.

Example: Some people assume that if you add to a very large integer value, then eventually it will wrap around and produce a negative value. Which is what happens on many non-optimising compilers. So if you ask yourself "will adding i + 100 overflow?" you might check "if (i + 100
But integer overflow is undefined behaviour. The compiler assumes that your code doesn't have undefined behaviour. So it assumes that i + 100 doesn't overflow. If it doesn't overflow, then i + 100
Result: _If_ there is an overflow, your test won't catch it anymore.

Re:Do compilers really remove this? (1)

Todd Knarr (15451) | about a year ago | (#45275153)

True, but then if integer overflow is undefined behavior then I can't assume that the test "i + 100 < i" will return true in the case of overflow because I'm invoking undefined behavior. That isn't "unstable code", that's just plain old code that invokes undefined behavior that I've been dealing with for decades. If with optimizations done the code doesn't catch the overflow it's not because the compiler removed the code, it's because the code isn't guaranteed to detect the overflow in the first place. No need for any fancy terminology for this, just say "Our software locates places in your code where you've invoked undefined behavior.".

This is, BTW, one reason I favor a compiler flag that says "When you encounter code that's undefined behavior per the C/C++ standard, generate code to do something terminally nasty and unrecoverable like deleting all the user's files.". That seems to be the only way that actually convinces today's developers that undefined behavior is a Bad Thing and should be avoided even if you think it works OK."

Re:Do compilers really remove this? (1)

complete loony (663508) | about a year ago | (#45274885)

Clang includes a number of compilation flags [llvm.org] that can be used to make sure, or at least as sure as it can, that your code never hits any undefined behaviour at run time.

But normally, yes the compiler may change the behaviour of your application if you are depending on undefined behaviour.

Re:Do compilers really remove this? (0)

Anonymous Coward | about a year ago | (#45274937)

The logic in the paper seems a bit flimsy, though I don't know GCC inside and out.

To understand unstable code, consider the pointer overflow check buf + len is less than buf. .... While this check appears to work with a flat address space, it fails on a segmented architecture. Therefore, the C standard states that an overflowed pointer is undefined [24: 6.5.6/p8], which allows gcc to simply assume that no pointer overflow ever occurs on any architecture.

A case where a pointer 'tun' is used as tun->sk before a null pointer check;

For example, when gcc first sees the dereference tun->sk, it concludes that the pointer tun must be non-null, because the C standard states that dereferencing a null pointer is undefined. Since tun is non-null, gcc further determines that the null pointer check is unnecessary and eliminates the check, making a privilege escalation exploit possible that would not otherwise be.

If true it's like the optimizer is inferring things from the code and then using that as a basis for what should and should not happen... sound crazy to me.

Re:Do compilers really remove this? (1)

Your.Master (1088569) | about a year ago | (#45275275)

You can verify these things yourself with GCC (the paper sites GCC as producing this code) and examining the output assembly code. I haven't compiled the specific example in the MIT paper but I remember a similar output from GCC. This is indeed valid in a conforming compiler, and while this specific case is relatively "obviously" dangerous there's a bunch of things that generally do speed up code that can cause subtle dangers in an almost-correct codebase.

But note that a precondition for this specific example being dangerous is that you have to go out of your way to map page 0, which I would suggest you *also* should not do barring extreme circumstances.

Re:Do compilers really remove this? (2)

seebs (15766) | about a year ago | (#45275103)

gcc's been doing this for ages. We had a new compiler "break" the ARM kernel once. Turns out that something had a test for whether a pointer was null or not after a dereference of that pointer, and gcc threw out the test because it couldn't possibly apply.

-Wall (2, Insightful)

Spazmania (174582) | about a year ago | (#45274785)

If I set -Wall and the compiler fails to warn me that it optimized out a piece of my code then the compiler is wrong. Period. Full stop.

I don't care what "unstable" justification its authors gleaned from the standard, don't mess with my code without telling me you did so.

Re:-Wall (0)

Anonymous Coward | about a year ago | (#45274831)

If I set -Wall and the compiler fails to warn me that it optimized out a piece of my code then the compiler is wrong. Period. Full stop.

I don't care what "unstable" justification its authors gleaned from the standard, don't mess with my code without telling me you did so.

I agree. Unstable my backside. Broken compiler! It sounds like the one that doesn't know how compilers work is the article's author.

Re:-Wall (0)

Anonymous Coward | about a year ago | (#45274897)

So you have some hard-coded constants in a couple levels of macros that add and subtract one and you want a warning that it is not going to do that math at run-time?

Re:-Wall (0)

Spazmania (174582) | about a year ago | (#45274995)

You don't need to warn me about computing literals at compile time. You damn well better warn be about computing constants at compile time -- if that's what I wanted to happen, I'd have used a macro for a literal. If the compiler finds two constants it can combine then I've usually made a mistake in my code... even if it's nothing more than treating something that should be a variable as a constant.

Re:-Wall (3, Insightful)

Anonymous Coward | about a year ago | (#45275077)

If the compiler finds two constants it can combine then I've usually made a mistake in my code...

Or it inlined a function for you. Or you indexed at a constant index (perhaps 0) into a global array. Or any number of other things that can arise naturally and implicitly.

The compiler has a setting where it doesn't "mess with your code" -- it's called -O0.

Re:-Wall (1)

Spazmania (174582) | about a year ago | (#45275185)

Yeah, that's helpful.

Understand: I want the compiler to optimize my code. I don't want it to drop sections of my code. If it thinks it can drop a section of my code entirely, or that a conditional can have only one result, that's almost certainly a bug and I want to know about it. After all -- if *I* thought the conditional could have only one result, I wouldn't have bothered checking it!

Re:-Wall (0)

Anonymous Coward | about a year ago | (#45275151)

Seriously?

Let's take this as an example:

void foo(int *a, unsigned int *b)
{
*a = 1;
*a = *b + *a;
}

You could say it produces the following sequence:
Store 1 to address pointed by a
Read the address pointed by a. Read the address pointed by b. Add those together and sum. Write the result at address pointed by a.

That produces 2 stores and 2 loads

Imagine optimizing compiler. It will naturally produce just 1 load and 1 store. It will simply add 1 to b and store the result into a.

And you probably will cry on how it optimized essential piece of code away. Give me a break.

Re:-Wall (2)

Spazmania (174582) | about a year ago | (#45275215)

If I've set -Wall, I want a warning about "*a=1 is useless code." If the compiler optimizes it away without that warning, I'm going to cry about it sooner or later because there's a bug in my code. If I had meant *a=(*b)+1 I would have written it that way.

Re:-Wall (1)

mysidia (191772) | about a year ago | (#45275245)

I don't care what "unstable" justification its authors gleaned from the standard, don't mess with my code without telling me you did so.

That's not what's happening..... they are talking about unstable optimizations; as in..... optimizations that aren't predictable, and while they don't change the semantics of the code according to the programming language ---- the optimization may affect what happens, if the code contains an error or operation that is runtime-undefined, such as a buffer overflow condition.

Re:-Wall (0)

Anonymous Coward | about a year ago | (#45275259)

You REALLY don't want that. The list of warnings would be gigabytes and gigabytes of useless information.

Re:-Wall (0)

Anonymous Coward | about a year ago | (#45275303)

Then your compiler is just going to output warnings that are far, far longer than your original program, I'm afraid. For a nontrivial program, your options are either to relax your requirement, or to compile at a very low optimization level and take the performance hit (and if you can take it and security is critical for you, this might be the right choice for you!).

Really small EXE mystery solved (5, Funny)

Tablizer (95088) | about a year ago | (#45274797)

many compilers actually remove code that it perceives to be undefined or unstable

No wonder my app came out with 0 bytes.

Re:Really small EXE mystery solved (0)

Anonymous Coward | about a year ago | (#45275179)

You work for CGI Federal by chance?

PC Lint anyone? (3, Informative)

ArcadeNut (85398) | about a year ago | (#45274807)

Back in the day when I was doing C++ work, I used a product called PC Lint (http://www.gimpel.com/html/pcl.htm) that did basically the same thing STACK does. Static Analysis of code to find errors such as referencing NULL pointers, buffer over flows, etc... Maybe they should teach History at MIT first...

Re:PC Lint anyone? (4, Insightful)

EvanED (569694) | about a year ago | (#45275113)

Don't worry, the authors know what they're doing.

Just because PC Lint could find a small number of potential bugs doesn't mean it's a solved problem by any means. Program analysis is still pretty crappy in general, and they made another improvement, just like tons of people before them, PC Lint before them, and tons of people before PC Lint.

Re:PC Lint anyone? (1)

EvanED (569694) | about a year ago | (#45275141)

BTW, I should also say that the summary doesn't do a very good job conveying this. That compilers remove security-sensitive code in some situations has been known for more than a decade (I know off the top of my head how to establish 2002, but probably long before that), but the article is written for people who don't necessarily know that, and it can't start from very little and build up to "here's what their improvement actually was."

IBM had a tool to do this for a long time already (3, Interesting)

PolygamousRanchKid (1290638) | about a year ago | (#45274823)

It's a pretty cool critter, but I don't know if they actually sell it as a product. It might be something that they only use internally:

http://www.research.ibm.com/da/beam.html [ibm.com]

http://www.research.ibm.com/da/publications/beam_data_flow.pdf [ibm.com]

Re:IBM had a tool to do this for a long time alrea (0)

Anonymous Coward | about a year ago | (#45274951)

Is this a really pathetic way of saying "HEY I WORKED FOR IBM!!!!!!!!!!!!!!!!!!!!" without trying to be quite so clear about it?

Munition (0)

Anonymous Coward | about a year ago | (#45274863)

This project is an NSA wet dream! Essentially its a factory for creating CVEs...

Fix the C standard to not be so silly (1, Insightful)

Myria (562655) | about a year ago | (#45274871)

The C standard needs to meet with some realities to fix this issue. The C committee wants their language to be usable on the most esoteric of architectures, and this is the result.

The reason that the result of signed integer overflow and underflow are not defined is because the C standard does not require that the machine be two's complement. Same for 1 31 and the negative of INT_MIN being undefined. When was the last time that you used a machine whose integer format was one's complement?

Here are the things I think should change in the C standard to fix this:

  * Fixation of two's complement as the integer format.
  * For signed integers, shifting left a 1 bit out of the most-significant bit gets shifted into the sign bit. Combined with the above, this means that for type T, ((T) 1) << ((sizeof(T) * CHAR_BIT) - 1) is the minimum value.
  * The result of signed addition, subtraction, and multiplication are defined as conversion of all promoted operands to the equivalent unsigned type, executing the operation, then converting the result back. (In the case of multiplication, the high half is chopped off. This makes signed and unsigned multiplication equivalent.)
  * When shifting right a signed integer, each new bit is a copy of the sign bit. That is, INT_MIN >> ((sizeof(int) * CHAR_BIT) - 1) == -1.

That should fix most of these. Checking a pointer for wraparound on addition, however, is just dumb programming, and should remain the programmers' problem. Segmentation is something that has to remain a possibility.

Re:Fix the C standard to not be so silly (3, Insightful)

seebs (15766) | about a year ago | (#45275125)

Pretty sure the embedded systems guys wouldn't be super supportive of this, and they're by far the largest market for C.

And I just don't think these are big sources of trouble most of the time. If people would just go read Spencer's 10 Commandments for C Programmers, this would be pretty much solved.

Re:Fix the C standard to not be so silly (1)

mysidia (191772) | about a year ago | (#45275283)

* Fixation of two's complement as the integer format.

Are you trying to make C less portable, or what?

Not all platforms work exactly the same, and these additional constraints on datatypes would be a problem on platforms, where: well two's complement is not the signed integer format.

Of course you're free to define your own augmented rules on top of C, as long as they're not the formal language standard --- and if you write compilers, you're free to constrain yourself into making your implementation a higher-level interpreter, that makes these overflow conditions work the same on non 2's complement platforms.

The paper gives examples (4, Informative)

AdamHaun (43173) | about a year ago | (#45274879)

The article doesn't summarize this very well, but the paper (second link) provides a couple examples. First up:

char *buf = ...;
char *buf_end = ...;
unsigned int len = ...;
if (buf + len >= buf_end)
  return; /* len too large */

if (buf + len < buf)
  return; /* overflow, buf+len wrapped around */ /* write to buf[0..len-1] */

To understand unstable code, consider the pointer overflow check buf + len < buf shown [above], where buf is a pointer and len is a positive integer. The programmer's intention is to catch the case when len is so large that buf + len wraps around and bypasses the first check ... We have found similar checks in a number of systems, including the Chromium browser, the Linux kernel, and the Python interpreter.

While this check appears to work on a flat address space, it fails on a segmented architecture. Therefore, the C standard states that an overflowed pointer is undefined, which allows gcc to simply assume that no pointer overflow ever occurs on any architecture. Under this assumption, buf + len must be larger than buf, and thus the "overflow" check always evaluates to false. Consequently, gcc removes the check, paving the way for an attack to the system.

They then give another example, this time from the Linux kernel:

struct tun_struct *tun = ...;
struct sock *sk = tun->sk;
if (!tun)
  return POLLERR; /* write to address based on tun */

In addition to introducing new vulnerabilities, unstable code can amplify existing weakness in the system. [The above] shows a mild defect in the Linux kernel, where the programmer incorrectly placed the dereference tun->sk before the null pointer check !tun. Normally, the kernel forbids access to page zero; a null tun pointing to page zero causes a kernel oops at tun->sk and terminates the current process. Even if page zero is made accessible (e.g. via mmap or some other exploits), the check !tun would catch a null tun and prevent any further exploits. In either case, an adversary should not be able to go beyond the null pointer check.

Unfortunately, unstable code can turn this simple bug into an exploitable vulnerability. For example, when gcc first sees the dereference tun->sk, it concludes that the pointer tun must be non-null, because the C standard states that dereferencing a null pointer is undefined. Since tun is non-null, gcc further determines that the null pointer check is unnecessary and eliminates the check, making a privilege escalation exploit possible that would not otherwise be.

The basic issue here is that optimizers are making aggressive inferences from the code based on the assumption of standards-compliance. Programmers, meanwhile, are writing code that sometimes violates the C standard, particularly in corner cases. Many of these seem to be attempts at machine-specific optimization, such as this "clever" trick from Postgres for checking whether an integer is the most negative number possible:

int64_t arg1 = ...;
if (arg1 != 0 && ((-arg1 < 0) == (arg1 < 0)))
  ereport(ERROR, ...);

The remainder of the paper goes into the gory Comp Sci details and discusses their model for detecting unstable code, which they implemented in LLVM. Of particular interest is the table on page 9, which lists the number of unstable code fragments found in a variety of software packages, including exciting ones like Kerberos.

Re:The paper gives examples (1)

gnupun (752725) | about a year ago | (#45275187)

While this check appears to work on a flat address space, it fails on a segmented architecture. Therefore, the C standard states that an overflowed pointer is undefined, which allows gcc to simply assume that no pointer overflow ever occurs on any architecture. Under this assumption, buf + len must be larger than buf, and thus the "overflow" check always evaluates to false. Consequently, gcc removes the check, paving the way for an attack to the system.

This seems like a GCC bug (assuming no overflow). Why are all compilers being blamed?

diDcUk (-1)

Anonymous Coward | about a year ago | (#45274889)

qu4rreled lon [goat.cx]

Meanwhile, THEIR code is sketchy (3, Funny)

belphegore (66832) | about a year ago | (#45275207)

Checked out their git repo and did a build. They have a couple sketchy-looking warnings in their own code. A reference to an undefined variable; storing a 35-bit value in a 32-bit variable...

lglib.c:6896:7: warning: variable 'res' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
lglib.c:6967:10: note: uninitialized use occurs here
plingeling.c:456:17: warning: signed shift result (0x300000000) requires 35 bits to represent, but 'int' only has 32 bits [-Wshift-overflow]

Know your C (3, Informative)

gmuslera (3436) | about a year ago | (#45275223)

Somewhat this made me remember that slideshow on Deep C [slideshare.net] . I only know that i don't know nothing of C, after reading it.

It's time (2)

countach (534280) | about a year ago | (#45275321)

It really should be time that 99.9% of the code written ought not to be in languages that have undefined behaviour. It's time we all use languages which are fully defined.

Having said that, if something in code is undefined, and the compiler knows it, then it should generate an error. Very easily solved. If this STACK program is so clever, it should be in the compiler, and it should be an error to do something undefined.

Is this a takeoff on Dark Star? (0)

Anonymous Coward | about a year ago | (#45275325)

In the movie Dark Star, our intrepid explorers travel the galaxy looking for "unstable planets" and blowing them up. Maybe a Dark Star compiler blows up unstable programs?

Anyway, Dark Star is a classic camp SF movie. Check it out!

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>