DieHard, the Software 230
Roland Piquepaille writes "No, it's not another movie sequel. DieHard is a piece of software which helps programs to run correctly and protects them from a range of security vulnerabilities. It has been developed by computer scientists from the University of Massachusetts Amherst — and Microsoft. DieHard prevents crashes and hacker attacks by focusing on memory. Our computers have thousands times more memory than 20 years ago. Still, programmers are privileging speed and efficiency over security, which leads to the famous "buffer overflows" which are exploited by hackers."
Vista already doing some of this (Score:5, Informative)
Re:Vista already doing some of this (Score:5, Interesting)
Re: (Score:2)
Re: (Score:2)
Is this feature standard in Linux yet? I'd hate to see us OSS guys get shown up by Bill...
Re:Vista already doing some of this (Score:5, Informative)
Re: (Score:2)
You do know *BSD is open source software these days, right?
Re: (Score:2)
Re:Vista already doing some of this (Score:5, Informative)
DieHard's randomization is very different from what OpenBSD does, not to mention Vista's address-space randomization. I've added a note to the FAQs that explains the difference in some detail, and answers several other questions, but in short: "address-space randomization" randomizes the base address of the heap and also mmapped-chunks of memory, leaving the relative position of objects intact. By contrast, DieHard randomizes the location of every single object across the entire heap. It also goes further in that it prevents a wide range of memory errors automatically, like double frees and illegal frees, and effectively eliminates heap corruption.
-- Emery Berger
Re:Vista already doing some of this (Score:5, Informative)
http://kerneltrap.org/node/5584 [kerneltrap.org]
Any thoughts?
Re:Vista already doing some of this (Score:5, Informative)
Here's a more detailed answer -- I'll add it to the FAQ.
OpenBSD (a variant of PHKmalloc) does some of what DieHard's allocator does, but DieHard does much more. On the security side, DieHard adds much more "entropy"; on the reliability side, it mathematically reduces the risk that a programmer bug will have any impact on program execution.
OpenBSD randomly locates pages of memory and allocates small objects from these pages. It improves security by avoiding the effect of certain errors. Like DieHard, it is resilient to double and invalid frees. It places guard pages around large chunks and frees such large chunks back to the OS (causing later references through dangling pointers to fail unless the chunk is reused). It attempts to block some buffer overflows by using page protection. Finally, it shuffles some allocated objects around on a page, randomizing their location within a page.
DieHard goes much further. First, it completely segregates heap metadata from the heap, making heap corruption (and hijack attacks) nearly impossible. On OpenBSD, a large-enough underflow on OpenBSD can overwrite the page directory or local page info struct (at the beginning of each page), hijacking the allocator. This presentation [ruxcon.org.au] describes several ways OpenBSD's allocator can be attacked. By contrast, none of DieHard's metadata is located in the allocated object space.
Second, DieHard randomizes the placement of objects across the entire heap. This has numerous advantages. On the security side, it makes brute-force attempts to locate adjacent objects nearly impossible -- in OpenBSD, knowing the allocation sequence determines which pages objects will land on (see the presentation pointed to above).
DieHard's complete randomization is key to provably avoiding a range of errors with high probability. It reduces the worst-case odds that a buffer overflow has any impact to 50%. The actual likelihood is even lower when the heap is not full. DieHard also avoids dangling pointer errors with very high probability (e.g., 99.999%), making it nearly impervious to such mistakes. You can read our PLDI paper for more details and formulae.
-- Emery Berger
Re: (Score:3, Informative)
Re: (Score:2)
Re:Vista already doing some of this (Score:4, Insightful)
Re:Vista already doing some of this (Score:5, Interesting)
Part of the other problem is that most home users expect secure data, but they aren't willing to do anything about it (e.g. set up non-admin users, install virus checkers/firewalls/etc).
Re:Vista already doing some of this (Score:5, Informative)
The real solution is programming in a language with secure memory management, such as
Re: (Score:2)
If your OS can only be trusted in a virtual environment, what's the point of using an OS at all?
Re: (Score:2)
or randomly locate virtual memory around the HD without regard to pre-existing magnetic conditions.
;-)
Different program? (Score:2, Informative)
Re:Different program? (Score:5, Funny)
Re: (Score:3, Funny)
Re: (Score:2)
That is so 1995. Update to 2007 and call it "Live Free or DieHard"
Re: (Score:2)
So it is. Speaking of which, does anyone here know how to interpret the numbers it generates? I ran it on the deadbeef random number generator [inglorion.net] a while ago (test results linked from that page), and my interpretation is that deadbeef_rand does well on some tests and very poorly on others. Am I right? Can one distill from DieHard's output what the weaknesses of the PRNG are?
Re:Different program? (Score:5, Funny)
Re:Different program? (Score:5, Funny)
Re: (Score:2, Insightful)
Correction (Score:5, Insightful)
Speed and efficiency of *development*, maybe.
Which is the problem. Modern software is so dependent on toolkits and compiler optimizations and various other "pre-made" pieces, that any program of even moderate complexity is doing things that the programmer isn't really aware of.
Re:Correction (Score:5, Insightful)
nothing to do with VMs - just exception handling (Score:3, Informative)
Re:Correction (Score:5, Insightful)
No, it was right the first time. Java is several orders of magnitude more secure by default than any random C or C++ program. Yet mention Java on a forum like, say, Slashdot, and you'll hear no end to how much Java sucks because "it's slow". (Usually ignoring the massive speedups that have happened since they last tried it 1996.) It doesn't matter that the tradeoff for that speed is flexibility, security, and portability. They want things to be fast for some undefined quantity of fast.
In fact, I predict that someone will be along to argue just how slow Java is in 3... 2... 1...
Re: (Score:2)
Re:Correction (Score:4, Interesting)
I've got to call you on the "portability" crap.
Java is about as portable as Flash... Sure, the major platforms are supported, but that's it. 3rd parties spent a lot of time trying to impliment java, but never did get everything 100%. Licensing issues, above all else, made it a real hassle to get Java on platforms like FreeBSD.
Meanwhile, C and C++ compiler are installed in the base system by default.
The only "portability" advantage Java has is perhaps in GUI apps, and that's at the expense of a program that doesn't look or work remotely similar to any other app on the system...
There are a great many reasons people don't use java. Performance is only a minor one.
Re: (Score:2)
Re:Correction (Score:5, Insightful)
The basic complaints I have heard are these:
Complaint 1: Java is slow.
As you stated, this is not a meaningful complaint.
Complaint 2: Garbage Collection stinks
GC is an obvious requirement of a "safe" language. As implemented in Java, it is downright stupid. When doing something CPU intensive, the GC never runs, leading to gobbling up memory until there is no more and thrashing to death. I'm sure that somebody is going to dig up that paging-free GC paper, but pay attention: that is a kernel-level GC.
Complaint 3: Swing is ugly/leaks memory
The first is a matter of opinion. The second is well-known. Swing keeps references to long-dead components hidden in internal collections leading to massive memory leaks. These memory leaks can be propagated to the parent application if it is also written in Java.
Complaint 4: Bad build system
Java cannot do incremental builds if class files have circular references. In a small project of about ten classes I was working on, the only way to build it was "rm *.class ; javac *.java"
Complaint 5: Tied class hierarchy to filesystem hierarchy
This was just stupid and interacts badly with Windows (and anything else with a case insensitive filesystem). It is even worse for someone who is first learning the language. It also makes renaming classes have a very bad effect on source control.
Complaint 6: Lack of C++ templates
C++ has some of its own faults. Fortunately its template system can be leveraged to fix quite a few of them. Java's generics have insufficient power to do the same thing.
Complaint 7: Lack of unsigned integer
These are oh-so-necessary when doing all kinds of things with binary formats. Too bad Java and all its descendents don't have them.
Complaint 8: Verbosity without a point
It has gotten so bad in places that I am strongly tempted to pass Java through the C preprocessor first, but I can't do that very well because of 4.
Corrections (Score:3, Informative)
Swing does not really have the problems you speak of any longer, if you are using it right... heck, it didn't really have those problem to any great degree about seven years ago when I was building a large custom client app all in swing for only desktop deployment.
Complaining about the build sys
Re: (Score:2)
emphasis added
I'm very surprised that no one else jumped on this one. I've never seen a well designed app that had circular references of any sort. I'll stipulate that such probably do exist, as there seems to be a case for doing things that would otherwise be dumb for just ab
Re: (Score:2)
No, it isn't. Memory safety can be achieved without the use of garbage collection, by avoid reallocation of freed memory to a differently-typed object. Access to freed pointers can be caught by page table manipulation. Sure, there's an overhead to these techniques, but then there's a non-trivial overhead to GC as well.
I'm sure that somebody is going to dig up that paging-free GC paper, but pay attention: that is a kernel-level GC.
Which just indicates tha
Re: (Score:2)
"12,2a,3f".split(',').collect {|num| num.to_i 16 }
=> [18, 42, 63]
And I got to write/test it interactively... I do agree about needless verbosity. Most programs would be better finished than optimized, code them in Ruby or Python and optimize any speed critical areas with C or anything else. C *is* premature optimization.
Re: (Score:2)
Re: (Score:2)
id10t
Re: (Score:2)
Re: (Score:2, Insightful)
Do you know what "several orders of magnitude" means? For variety, next time you should write "... exponentially more secure ..." or "... takes security to the next level!"
BTW, it's funny you should mention Java performance in this thread -- one of the DieHard authors published this fascinating paper on Java GC performance: http://citeseer.ist.psu.edu/hertz05quantifying.htm l [psu.edu] -- executive summary: GC can theore
Re: (Score:2)
And I agree, speaking on "several orders of magnitude" is taking it a bit far. It's much more secure, but since security is *very
Re: (Score:2)
Other issues, let's see, for some reason, firefox hangs when more than one java app is run in the browser. I see alot of Java apps where dialogs are non-modal (you can access/view one window at a time). Java apps aften *requi
Re: (Score:2)
Re:Correction (Score:4, Insightful)
Maybe, but most programs are not written in a way which will achieve this goal.
Programmer time is a limited resource. This is true even on a hobby project with no deadlines and everybody working for free; you want to ship sometime. Making programs run fast takes a lot of programmer time, even when you use a language which is supposedly fast by default such as C or C++.
C and C++ make you spend a lot of time working around weaknesses in the language and fixing bugs that other languages can never have. A great deal of programmer time is put into developing the broken and slow implementation of half of Common Lisp that every sufficiently complex program must contain.
All of this time spent is time that does not go into making the program fast.
By using a language that makes programmers more productive, you get a lot more time to make the program fast. You can do this by optimizing in the "slow" language you started with, by rewriting inner loops in C, by changing the whole algorithm to run on the GPU, etc.
The 90/10 rule says that your program spends 90% of its time in only 10% of its code, and that optimizing the other 90% of the code is basically a waste. And yet people who want their programs to "go fast" are writing that 90% in a low-level language, effectively wasting a large amount of effort.
You may also end up getting your program working, realize that it actually is fast enough despite being written in a really slow interpreted language, and spend the time you saved making more cool software. Or you can go back and make the original product fast. It's up to you.
There are many good reasons to use C, and many good reasons to write entire programs in C, but "it's fast" is not a particularly good reason. An app written in pure C is probably not as fast as it can be unless its scope is very limited.
Re: (Score:2)
One mans' weakness is another mans' strength. If you have to spend a lot of time to implement functionality missing from a language, perhaps you chose the wrong tool for the work. This is one of my peeves with th
Re:Correction (Score:4, Interesting)
1. The fastest programs are written in C/C++
2. On average, Lisp/Scheme programs are fasted, followed by C/C++ programs, and Java programs are way behind.
3. Development time is shortest for Lisp/Scheme, with Java and C++ being more or less equal.
4. C/C++ programs used the least memory, with Lisp/Scheme and Java being about equal.
5. There was very little variation in the run time and development time of Lisp/Scheme programs, and a lot of variation everywhere else.
The PDF contains some nice graphs illustrating all this.
Re:Correction (Score:4, Insightful)
Business software isn't the problem. The software that is the problem is the software that runs on every naive home user's PC
Berger sounds like a VM-language bigot (or paid ($30K from MS)
who doesn't understand how most software is really made, and prefers to believe in caricatures of programmers.
Great, you've called the guy a bigot, a shill, and an idiot, without even having understood what he was talking about.
Re: (Score:2)
Which means even less emphasis on security.
Seems to me he's a researcher who's noticed that "most software is really made" by C and C++ programmers in a hurry. He's addressing the needs of developers
Re: (Score:2)
I wondered where all those vulnerabilities were coming from. It's not humans misusing memory references and overrunning ad hoc fixed length buffers, etc. It's the toolkits, libraries and compilers! Glad we got that figured out.
From the post:
Our computers have thousands times more memory than 20 years ago. Still, programmers are privileging speed and efficiency
Re: (Score:3, Insightful)
This implies is that because memory is larger less attention can be paid to efficiency, but the hapless programmers don't know better. I used to use quicksort when I had 640 KiB of RAM, but now that I have 8 GiB, I'll just use bubble sort. Brilliant.
You are really misrepresenting his point here. We both know that bubble sort would run much slower on a 8Gb dataset than quicksort. The real comparison is "should we some really tricky and nasty code for this particular function or should it be a giant lookup table?" When memory is (relatively) cheaper than processor time, the set of tradeoffs changes. Some of these tradeoffs then mean than code can be written more correctly (securely) at the expense of higher memory usage. These tradeoffs are intuitively
Um, no. (Score:2)
The fact that Microsoft doesn't HAVE a security model, IE/Outlook are jokes, and users run as admin has a bit more to do with it.
Re: (Score:2)
And it is almost as easy to exploit. Intead of overflowing to the return address, you overflow
to the nearest vptr (if C++ is being used), to the nearest function pointer or to the nearest green bit.
Not quite (Score:2)
Re: (Score:2)
Maybe not "inherently bad", but certainly inefficient as regards speed and space since the entire array must be copied to the stack rather than just an array pointer.
Re: (Score:2)
void foo(){
int arr1[52];
}
The only overhead that had to be done here was moving the stack pointer down an extra 52*4 bytes, which is no more work than what it was doing already. Assuming you are in a language that doesnt initialize every element of an array when you declare it. Arrays on the stack are not inherintly inefficient although they certainly c
Re: (Score:2)
foo(char str[BIG_NUMBER])
{
Re: (Score:2)
I am amazed that you would so arrogantly declare that simply doing a bit of static analysis would be sufficient to fix all (or even most) buffer overflows in complex programs with hundreds of thousands or even millions of lines of code. It sounds like you just looked at one tutorial of
Let's solve voting security. (Score:3)
Re: (Score:2)
Re: (Score:2)
Algorithms demand perfection (Score:2)
Write the algorithms correctly and there won't BE any buffer-overflows.
What's so hard about this?
You must remember... (Score:5, Insightful)
Even assuming nobody wants to go to all that trouble, there are solutions. ElectricFence and dmalloc are hardly new and far from obscure. If a developer can't be bothered to link against a debugging malloc before testing then you can't expect their software to be immune to such absurd defects. A few runs whilst using memprof isn't a bad idea, either.
This assumes you're using a language like C, which is not a trivial language to write correct software in. For many programs, you are better off with a language like Occam (provided for Unix/Linux/Windows via KROC) where the combination of language and compiler heavily limits the errors you can introduce. Yes, languages this strict are a pain to write in, but the increase in the initial pain is vastly outweighed by the incredible reduction in agony when debugging - if there's any debugging at all.
I do not expect anyone to re-write glibc in Occam or any other nearly bug-proof language. It would be helpful, but it's not going to happen.
Re: (Score:2)
Your claim that smart programmers using dmalloc, electric fence, or some other bounds checker will find all buffer overflows seems misguided to me. Those tools are great for catching buffer overflows that are actually being caused by your test suite. But arent most buffer overflow security holes caused by weird corner cases noone though of? I mean in the real world its never caused by som
Re: (Score:2)
I would go farther than that and say that writing a correct program in any language is not trivial. While it is true that certain languages (like Java) limit the kinds of errors a programmer can make, they do nothing to limit the number of errors a programmer can make.
While I understand there are some languages more appropriate for solving certain types of problems, making a language programmer-proof is never a worthy goal. Usually, the attempt to make a "better COBOL" ends up evolving like this:
Re: (Score:2)
What's so hard about this?
The "write the algorithms correctly" part. The demand for programs is much larger than the supply of sufficiently trained/disciplined/talented programmers. Therefore, we need a solution that gives acceptable results even when the programmer isn't a guru (and preferably when the programmer is a trained monkey, because he often will be)
Re: (Score:2)
Wouldn't a language do the same job? (Score:2)
Note: I'm sure other functional lang
Re: (Score:2)
Yes.
But it's not a practical solution for about 185 different reasons starting with the fact that very few commercial apps are written in any kind of dynamic language, let alone LISP, and they're not likely to be rewritten anytime soon for such an intangible reason as security. rpg was right that worse is better, and the last language will be C. he wrote that before java, ruby, etc., but I think it's still right. Like it or hate it.
Re: (Score:2)
(defun bar (array i x)
"Set the Ith element of array ARRAY to X"
(declare (type fixnum i)
(type (
Buggy (Score:3, Interesting)
Addendum (Score:2)
Ren and Stimpy said it best (Score:2)
Lots of memory available? (Score:5, Interesting)
In reading this article, I started to wonder a lot about this. writing to conserve memory is a bad thing? I will say that I haven't noticed that in most software, regardless of whether it's OSS or closed-source. If anything, there seems to be a variation of Parkinson's Law in effect. Yes, computers these days have a lot more memory available, however, the number of applications and the size demands of each application has grown almost in lock-step with that. 15 or so years ago, yes, you had one OS and one application running - maybe, if you were lucky or were running TSR apps, two or three. These days, the OS takes up a hefty chunk, and it's not uncommon to see 8 or 9 (if not more) applications running at once. What they all seem to have in common is that they assume they have access to all the RAM, or as much of it as they can grab.
I have to wonder if he's actually looked at things these days. I don't see where programming (properly done) to conserve memory is a bad thing. If anything, it seems that few are actually doing it.
Re: (Score:2)
access to all the RAM, or as much of it as they can grab.
They don't assume "access to as much RAM as they can grab", they assume "access to as much RAM as they need". Given the presence of gigabyte RAM modules, virtual memory, and near-terabyte hard drives, this is usually a reasonable assumption.
I have to wonder if he's actually looked at things these days. I don't see where programming (properly done) to conserve memory is a bad thing. If
Re: (Score:2)
Memory is dirt-cheap these days, and if you've got it you might as well put it to use.
You have an interesting definition of "dirt cheap". Doing a quick check, 1GB RAM is running around $175-$200. Admittedly, that's a lot cheaper than 12 to 15 years ago, when it was averaging $25 a MB, but I don't consider that "dirt cheap." The problem, as you pointed out, is that they grab as "much as they need", or more correctly, as the developer(s) think it needs. That's fine in an isolated system, where it's t
Re: (Score:2)
Look, if you want to run RAM hungry apps, you need to either purchase more memory, or open fewer apps at once. Or, I guess you could go back to using the apps that you were using a few years ago. I'm sure they'll run with the same, small memory footprint that you want them to.
Re: (Score:2)
Re: (Score:2)
If you want to solve this problem, professors need to
Randomness. Nooooo! (Score:3, Insightful)
The worst bugs are the ones that are hard to reproduce. In fact, when faced with a bug that's difficult to reproduce, I've been known to quip "yet another unintentional random number generator". The suggestion that they're going to apply a pseudo-fix that involves random allocations raises all kinds of red flags. I'd much rather have fine-grained control over which sections of code are allowed to access which sections of memory, and be able to track which sections of code are accessing a chunk of memory. I'd much rather have strict enforcement of a non-execute bit on memory that's only supposed to contain data (there is some support for this already). Introducing randomness into memory allocation? Worst. Idea. Ever. It's like throwing in the towel, and if they put that in at low levels in system libs and things like that, we're screwed in terms of every being able to *really* fix the problem. If their compiler is going to link against an allocator that has this capability, I hope they provide the ability to disable it.
Re: (Score:2)
from the plan 9 fortune file :
Almost all good computer programs contain at least one random-number generator.
has been done before (Score:2)
In practice, however, a decent set of test cases together with valgrind will make any of those runtime gymnastics unnecessary.
Re: (Score:2)
First of all, none of what you say invalidates what I was saying: the underlying techniques are well-known, so DieHard is at most an engineering effort.
Is it better than those other systems? I don't think so; DieHard just makes a different set of tradeoffs than other systems. In fact, DieHard can't even find all the problems that Valgrind finds.
Overall, I think retroactive attempts to fix C/C++'s memory management
Do NOT INSTALL THIS (Score:2, Informative)
It's the programmers fault? (Score:2, Informative)
Typically, a programmer is doing their job. The programmers manager is d
Re: (Score:2)
It is a programmer's responsibilty to find out what properly is.
Speed - buffer overflow? (Score:3, Interesting)
Re: (Score:2)
are you sure it's not the sequel? (Score:2)
http://www.imdb.com/title/tt0337978/ [imdb.com]
When a criminal plot is in place to take down the entire computer and technological structure that supports the economy of the United States (and the world), it's up to a decidedly "old school" hero, police detective John McClane (Willis), to take down the conspiracy, aided by a young hacker (Long).
Why the restriction to "non-commercial" use? (Score:2, Insightful)
Re: (Score:2)
Re: (Score:2)
Die Hard 4 - going commando.
Re:What a load of shit. (Score:4, Interesting)
1) If there is a program crash, it may be possible to reproduce the bug on the same computer, but probably not on 2 different ones, such as the user's and the developer's.
2) It discourages programmers from good design and thorough testing by leading them to believe that bugs won't occur.
The claim for DieHard (from the whitepaper) is that it "tolerates memory errors and provides probabilistic memory safety". But bugs will still happen! I once added about 10 lines of code to log a bug our team was having a hard time tracking down. It turned out to have its own bug that would be hit if:
- Two threads were accessing the same buffer
AND
- One of them was swapped out during the execution of 3 machine instructions (out of about a million)
It took my moderately sized customer base 2 weeks to hit it. The only way to avoid memory errors is to make the code bulletproof, which means fixing it when bugs are reported.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
clueless is thinking you can reclaim the meaning of a word once a new definition becomes common usage in a larger world
Re: (Score:2)
People who hav
Re: (Score:2)
Re: (Score:2)
I understand that a number of more recent CPU architectures have taken the step of putting the return address into a register, rather than on the stack. Itanium is an example.
Re: (Score:2)
In my experience, the article you suggest could be summed up in just one sentence:
"Our quarterly numbers are dependent on getting code out the door, and it's far more important to management to hit those quarterly numbers than to guarantee quality."