Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Are You Sure This Is the Source Code?

timothy posted about a year ago | from the not-as-simple-as-md5-sum dept.

Open Source 311

oever writes "Software freedom is an interesting concept, but being able to study the source code is useless unless you are certain that the binary you are running corresponds to the alleged source code. It should be possible to recreate the exact binary from the source code. A simple analysis shows that this is very hard in practice, severely limiting the whole point of running free software."

cancel ×

311 comments

Bogus argument (5, Insightful)

Beat The Odds (1109173) | about a year ago | (#44063025)

"Exact binaries" is not the point of having the source code.

Re:Bogus argument (-1, Redundant)

thepike (1781582) | about a year ago | (#44063039)

This right here

Re:Bogus argument (0, Redundant)

Anonymous Coward | about a year ago | (#44063159)

Thank you for adding so much insight to the discussion.

I mean, uhh... yeah, this!

Re:Bogus argument (2)

tloh (451585) | about a year ago | (#44063245)

<Jedi>These are not the code you are looking for.....</Jedi>

*ducks*

Re:Bogus argument (5, Informative)

Anonymous Coward | about a year ago | (#44063133)

The guy who submitted that article is the person who wrote it. Awesome "work", editors.

Re:Bogus argument (4, Insightful)

icebike (68054) | about a year ago | (#44063457)

But too his credit, he did say a "simple analysis" although when reading TFA he omitted the word "minded" from the middle of that phrase.

Virtually all of his findings are traced to differences in date and time and chosen compiler settings and compiler vintage.
Unless he can find large blocks of inserted code (not merely data segment differences) he is complaining about nothing.

He his certainly free to compile all of his system from source, and that way he could be assured he is running
exactly what the source said. But unless and until he reads AND UNDERSTANDS every line of the source he is
always going to have to be trusting somebody somewhere.

Its pretty easy to hide obfuscated functionality in a mountain of code (in fact it seems far too many programmers pride
themselves their obfuscation skills). I would worry more about the mountain he missed while staring at the
mole-hill his compile environment induced.

Re:Bogus argument (0)

Anonymous Coward | about a year ago | (#44063469)

When someone can submit their Slashdot Journal entries as articles and have them accepted, I fail to see the problem with someone submitting his own blog post.

Re:Bogus argument (3, Funny)

briancox2 (2417470) | about a year ago | (#44063527)

This looks like the shortest, most consise piece of FUD I've ever seen.

I wonder if next week I could get a story published that say, "I don't know if Microsoft is spying on you through your webcam. So it could be true."

Re:Bogus argument (4, Insightful)

CastrTroy (595695) | about a year ago | (#44063207)

Ok, maybe not exact binaries, but what if you can't even make a binary at all, or if you do make one, how do you ensure it's functioning the same? That's the problem that many people have with open source code that exists in languages that you can only compile with a proprietary compiler. Take .Net for instance. It's possible to write a program that is open source, and yet you're at the mercy of Microsoft to be able to compile the code. Even when I download Linux packages in C, it's often the case that I can't compile them, because I'm missing some obscure library that the original developer just assumed I had. What good is code if you are unable to compile it is right up there with "what use is a phone call, if you are unable to speak". Some code only works with certain compilers, or with certain flags turned on in those compilers. Simply having the source code doesn't mean you have the ability to actually use the source code to make bug fixes should the need arise.

Re:Bogus argument (3, Insightful)

Anonymous Coward | about a year ago | (#44063309)

To borrow from The Watchmen:

Who compiles the compiler?

Re:Bogus argument (5, Informative)

arth1 (260657) | about a year ago | (#44063511)

To borrow from The Watchmen:

Who compiles the compiler?

Your attribution isn't just a little off, it's way off.
Try Iuvenalis, around 200 AD.

Re:Bogus argument (4, Funny)

NoNonAlphaCharsHere (2201864) | about a year ago | (#44063615)

To borrow from the Tao Te Ching: "The Source that can be told is not the Source."

^^THIS (0)

Anonymous Coward | about a year ago | (#44063379)

And even building in Linux with GNU, I have come across problems with source that wouldn't compile and the endless chase of dependencies and libraries. And having problems with libraries no longer supported or not supported on my platform - *cough*Ubuntu*cough*.

Re:Bogus argument (4, Insightful)

ZahrGnosis (66741) | about a year ago | (#44063417)

If you're worried about the lineage of a binary then you need to be able to build it yourself, or at least have it built by a trusted source... if you can't, then either there IS a problem with the source code you have, or you need to decide if the possible risk is worth the effort. If you can't get and review (or even rewrite) all the libraries and dependencies, then those components are always going to be black-boxes. Everyone has to decide if that's worth the risk or cost, and we could all benefit from an increase in transparency and a reduction in that risk -- I think that was the poster's original point.

The real problem is that there's quite a bit of recursion... can you trust the binaries even if you compiled them, if you used a compiler that came from binary (or Microsoft)? Very few people are going to have access to the complete ground-up builds required to be fully clean... you'd have to hand-write assembly "compilers" to build up tools until you get truly useful compilers then build all your software from that, using sources you can audit. Even then, you need to ensure firmware and hardware are "trusted" in some way, and unless you're actually producing hardware, none of these are likely options.

You COULD write a reverse compiler that's aware of the logic of the base compiler and ensure your code is written in such a way that you can compile it, then reverse it, and get something comparable in and out, but the headache there would be enormous. And there are so many other ways to earn trust or force compliance -- network and data guards, backups, cross validation, double-entry or a myriad of other things depending on your needs.

It's a balance between paranoia and trust, or risk and reward. Given the number of people using software X with no real issue, a binary from a semi-trusted source is normally enough for me.

Re:Bogus argument (3, Insightful)

CastrTroy (595695) | about a year ago | (#44063555)

I'm not really even talking from a trust point of view, but more the other point of open source software, which is, "if there's a bug in the code, you can fix it yourself". Without even going down that whole tangent of recursively verifying the entire build chain, there's the problem of being able to even functionally compile the source code so that you can make fixes when you need to.

Re:Bogus argument (1)

Anonymous Coward | about a year ago | (#44063437)

how do you ensure it's functioning the same?

You run a test suite.

Re:Bogus argument (1)

arth1 (260657) | about a year ago | (#44063579)

You run a test suite.

Which is one reason why important open source programs make sure that the test suite and its sources are also available and up to date.

Or, you examine the source, and then compile it with a compiler from a different source.

Re:Bogus argument (5, Insightful)

oGMo (379) | about a year ago | (#44063547)

Simply having the source code doesn't mean you have the ability to actually use the source code to make bug fixes should the need arise.

And yet, it still means that you can fix it, or even rewrite it in something else, if you want. Not having the source code means this is between much-more-difficult and impossible. The lesson here should be that everything we use should be open source, including compilers and libraries, not "well in theory I might have problems, so screw that whole open source thing .. proprietary all the way!"

Re:Bogus argument (1)

chuckinator (2409512) | about a year ago | (#44063219)

It definitely is. Discounting difference in hardware of their build machine and your build machine and difference in versions of compiler, libraries, etc, it's still a bogus argument. I've had the same compiler on the same machine produce different binaries on two consecutive builds on the same day due to changing memory addresses of values throwing the checksum completely off.

Also, the author needs to install redhat-rpm-config on his system if he's trying to generate stripped binaries with separate debuginfo packages.

Re:Bogus argument (1)

lgw (121541) | about a year ago | (#44063321)

"Exact binaries" is not the point of having the source code.

The use case is "we're using this binary in production, which we didn't build ourselves". That's how open source is generally used in practice, after all - you download the binaries for your platform, and you (maybe) archive the source away somewhere just in case.

Isn't that the strongest practical use case for Open Source in the business world? Sure, you don't plan on maintaining it yourself but you could if you have to. The problem is, if the source doesn't match the object, you can't just fix a bug - you have to requalify this whole new software package which just happened to come from the same place, and which may be a very different version of the software.

I've had to spend time maintaining binaries where we didn't have matching source and we couldn't migrate to what would be build from the source. Maintaining raw binaries with the source as just a guideline blows goats, and you never want to be the guy stuck doing it.

Re:Bogus argument (2)

Chuckstar (799005) | about a year ago | (#44063411)

No. The strongest practical use case for Open Source in business is that the Open Source version is some combination of better/cheaper than alternate versions, with "better" including the fact that Open Source projects often get updated faster when security bugs (and sometimes other bugs) are found. The possibility of bringing development fully in-house is not a practical solution for 99.99% of businesses. (I'm exaggerating a little, but not much).

Re:Bogus argument (2)

jythie (914043) | about a year ago | (#44063397)

Yeah.. it really strikes me that the person is over exaggerating the importance of a narrow set of use cases. Reproducible builds are nice, and in some cases important, and in an ideal case compiling should be sufficiently deterministic one should be able recreate any given binary, but I would not say that is the 'point' of having access to source code.

Re:Bogus argument (1)

OakDragon (885217) | about a year ago | (#44063587)

Already at 5, Insightful, so please enjoy this virtual "+1 Insightful"...

Re:Bogus argument (4, Interesting)

Aaron B Lingwood (1288412) | about a year ago | (#44063631)

"Exact binaries" is not the point of having the source code.

You are correct. However, it is a method to confirm that you have received the entire source code.

The point being made is that a binary could always contain functions that are malicious, buggy or infringe on copyright while the supplied source does not.

Case Study:

A software company (lets call them 'Macrosift') takes over project management of a GPL'd document conversion tool. Macrosift contribute quite a bit of code and the tool really takes off. Most users are obtaining this tool be either the Macrosift-controlled repository or a Macrosift partner-controlled repository as a pre-compiled binary. It can even convert all kinds of documents flawlessly into Macrosift's Orifice 2015 new extra standard format which no other tool seems to be able to do.

Newer versions of OpenOffice, LibreOffice, JoeOffice come out and this tool just doesn't seem to be doing the job. Sure, it converts perfectly from everything into MS .xsf but doesn't work so well the other way and won't work at all between some office suits. The project gets forked by the community to make it feature complete. The project managers start by compiling the source, and to their surprise, the tool will not work as well as the binary did. After a year passes, the community realizes they've been had. By painstakingly decompiling the binary, they discover that the function that converts to MS proprietary .xsf is different to that in the source. Another hidden function is discovered in the binary that introduces errors and file bloat after a certain date if the tool is being used solely on non-MS documents.

How else can I ascertain whether you have supplied me with THE source code for THIS binary if I can not produce said binary with provided source code?

Simple Analysis (1)

mapfortu (2567463) | about a year ago | (#44063029)

f(x) = (x + 2) ^ 2

Now multiply that by the number of transistors required to run the compiler. In that ballpark.

WTF... (0)

Anonymous Coward | about a year ago | (#44063033)

I think I'm done with slashdot. The "articles" have just become tweets in disguise.

Re:WTF... (0)

Anonymous Coward | about a year ago | (#44063195)

How would anyone know? "Articles" are hypothetical. We read summaries, if that.

What a problem (0)

Anonymous Coward | about a year ago | (#44063037)

Has anybody thought about recompiling the source and seeing if you get the same binary?

Re:What a problem (3, Insightful)

jedidiah (1196) | about a year ago | (#44063129)

...or just using a binary that you compiled from binary yourself.

For a lot of projects, that's not nearly as hard as some people like to make it sound.

Re:What a problem (5, Funny)

h4rr4r (612664) | about a year ago | (#44063241)

Hey now, you have to be pretty IT savvy to type ./configure, make and make install all in the same day. Some of us make good money doing that, don't just go suggesting everyone should be doing it.

more difficult in practice (2)

Chirs (87576) | about a year ago | (#44063441)

./configure, make, make install assumes you're building on the target machine. Many times you want to build on one machine and deploy on another. Even now, there are a lot of packages that don't work properly when cross-compiling. So you end up hardcoding config files, overriding options, patching the source/Makefiles, etc.

Also, in our environment we need to isolate the build system from the host environment to avoid contamination from the host libraries, and we need to version-control the build system so that we can go back and build the same product we built three years ago for the purposes of fixing a bug for a paying client.

So while open-source helps a lot, many times it takes significant effort to bring in some arbitrary package and build it from source.

Re:more difficult in practice (1)

h4rr4r (612664) | about a year ago | (#44063467)

Whoosh!

That was the joke passing over your head.

I of course agree with everything you said. I was merely being flippant for the sake of humor.

Re:What a problem (0)

Anonymous Coward | about a year ago | (#44063557)

No make test? The flying screens of white text on black they produce is tangible evidence of your "productivity"! Never make pacakges without it!

Re:What a problem (1)

Qzukk (229616) | about a year ago | (#44063137)

Has anybody thought about recompiling the source and seeing if you get the same binary?

The article says you can try, but you don't.

Re:What a problem (1)

cold fjord (826450) | about a year ago | (#44063143)

Has anybody thought about recompiling the source and seeing if you get the same binary?

That doesn't necessarily work unless you have the exact same build environment (libraries, compilers, etc.), and compiler settings.

Re:What a problem (1)

Synerg1y (2169962) | about a year ago | (#44063177)

I thought so... the build environment does affect the final hash. However, thinking about this logically most places you can get the source code and executable from the same place... and if the executable matches... how paranoid can you be?

If you're getting the alleged source code to Windows 9 from some guy in Nigeria though, set your expectations accordingly.

Re:What a problem (1)

gweihir (88907) | about a year ago | (#44063165)

That is what the OP is talking about.

Suddenly it becomes obvious what the AC posting possibility is really about...

Re:What a problem (1)

X0563511 (793323) | about a year ago | (#44063201)

Differing library, linker, compiler versions, configurations, and parameters would all change the output. You'd have to use the exact same system for the two builds, or you are not guaranteed to get a byte-for-byte duplication.

Re:What a problem (5, Insightful)

TheRaven64 (641858) | about a year ago | (#44063421)

Most of the time, even that isn't enough. C compilers tend to embed build-time information as well. For verilog, they often use a random number seed for the genetic algorithm for place-and-route. Most compilers have a flag to set a specified value for these kinds of parameter, but you have to know what they were set to for the original run.

Of course, in this case you're solving a non-problem. If you don't trust the source or the binary, then don't run the code. If you trust the source but not the binary, build your own and run that.

Re:What a problem (4, Insightful)

arth1 (260657) | about a year ago | (#44063319)

Has anybody thought about recompiling the source and seeing if you get the same binary?

Has anybody thought of reading the article before posting questions like this?

That said, this particular "article" isn't worth the waste of bytes it takes up. It's like seeing a 6 year old trying to explain a combustion engine.

Binaries will almost always differ - if nothing else because you need the entire environment exactly like the binary builder. Not just the time stamps, compile paths, hostnames and account names, which are the obvious.
If your compiler or linker is a minor version off what he used, the results can be very different, even if using the same compile options.
But that's not enough: If your hardware is different, randomization of functions in a library will be different.

To flesh out his article a bit more, the author could have done a test with two different Gentoo systems. Different but mostly compatible hardware, and a slight difference in the toolchain. That might have opened his eyes.
Then again, probably not.

Being able to is nice, but who has the time? (4, Interesting)

intermodal (534361) | about a year ago | (#44063047)

Given the scale of most modern programs' codebase, good luck actually reviewing the code meaningfully in the first place. That said, if you're really that concerned about the code matching the source, run a source-based distro like Gentoo or Funtoo. For most practical purposes, though, users find binary distributions like Debian/Ubuntu or the various Red Hat-based systems to be more effective in regards to their time.

bugfixes, not paranoia (1)

Chirs (87576) | about a year ago | (#44063497)

We frequently discover a bug and need to fix it without upversioning the whole package (which could result in other incompatibilities with the rest of the system).

So we track down the code for the version we're using, get it building from source with suitable config options, and then fix the bug. In the simple case the bugfix is present in a later version and we can just backport it. In the tricky case you need to get familiar enough with the code to fix it (and hopefully in a way that the upstream maintainers will accept).

The obvious thing is (4, Insightful)

Chrisq (894406) | about a year ago | (#44063049)

If you are that paranoid study the source code then recompile

Re:The obvious thing is (1)

gl4ss (559668) | about a year ago | (#44063475)

If you are that paranoid study the source code then recompile

yeah if he is bothering to read through it he should quite easily be bothered enough to compile it as well.. that's what he was going to do anyhow to compare.

also, you could clone the compile chain of popular linux distros as well, without fuss. it's not like they hide their build system behind closed doors.

Re:The obvious thing is (1)

Jon Stone (1961380) | about a year ago | (#44063541)

That's not guaranteed to address the problem. http://cm.bell-labs.com/who/ken/trust.html [bell-labs.com] To compile the source code you used the binary compiler...

Timestamps make a difference (0)

Anonymous Coward | about a year ago | (#44063053)

Lots of builds include a timestamp or use it so this isn't always guaranteed.

I like to use auto-generated hash signatures of code in my builds when I want to know an exact version or even exact build of the same source tree.

Re:Timestamps make a difference (1)

Samantha Wright (1324923) | about a year ago | (#44063325)

In TFA, that was the major source of difference. Debian, Fedora, and OpenSUSE packages were tested; Debian differed only in the timestamps, OpenSUSE had a few lingering debug features, and the Fedora binary was a little weirder (perhaps the result of a different compiler version?)

Exact match (0)

Anonymous Coward | about a year ago | (#44063057)

The trouble with trying to get an exact match is there are so many variables. Do you have the same operating system, the same architecture, the same versions of the same libraries, the same version of the same compiler? What about the same compiler flags? Unless all of those things are an exact match the odds against getting a matching binary are slim. Really, though, it becomes a bit of a moot point because, once you have the source code, you can create your own binary and don't have to wonder if the previous binary was a match.

Then recompile it (0)

Anonymous Coward | about a year ago | (#44063063)

Now you have a binary which "corresponds" to the source code.

Gentoo (0)

Anonymous Coward | about a year ago | (#44063087)

Thus narrowing the issue to binaries in stage3 archive.

Re:Gentoo (-1)

Anonymous Coward | about a year ago | (#44063197)

If you're not getting the same binary in Gentoo, obviously you forgot to -funroll-loops.

Re:Gentoo (1)

Bill, Shooter of Bul (629286) | about a year ago | (#44063595)

-funroll-loops, the breakfast of champions.

How about this: (-1)

Anonymous Coward | about a year ago | (#44063089)

U0 Main()
        {
            GetChar();
            PrintF("Bible Line:%d",HPET%100000);
        }

        Main;

http://www.randomnumbers.info/

God says:

The Proverbs

1:1 The proverbs of Solomon the son of David, king of Israel; 1:2 To
know wisdom and instruction; to perceive the words of understanding;
1:3 To receive the instruction of wisdom, justice, and judgment, and
equity; 1:4 To give subtilty to the simple, to the young man knowledge
and discretion.

1:5 A wise man will hear, and will increase learning; and a man of
understanding shall attain unto wise counsels: 1:6 To understand a
proverb, and the interpretation; the words of the wise, and their dark
sayings.

1:7 The fear of the LORD is the beginning of knowledge: but fools
despise wisdom and instruction.

1:8 My son, hear the instruction of thy father, and forsake not the
law of thy mother: 1:9 For they shall be an ornament of grace unto thy
head, and chains about thy neck.

touch o' hyperbole (5, Insightful)

ahree (265817) | about a year ago | (#44063099)

I'd suggest that "severely limiting the whole point of running free software" might be a touch of an exaggeration. A huge touch.

Re:touch o' hyperbole (1)

Anonymous Coward | about a year ago | (#44063269)

Call me stupid, but if you bother to build your own binary, why would you download the binary at all instead of running the one you compiled?

Re:touch o' hyperbole (4, Interesting)

MozeeToby (1163751) | about a year ago | (#44063533)

The issue the author is bringing up is that you have no way to easily determine that the published binary is, in fact, functionally identical to the published source code. Imagine you write an app that accesses private data and open source it, saying "check the source, the only thing we use the data for is X". And if you look at the source, that's certainly true. But there's no way to verify that the binary download was built from the published source; especially if the resulting binary is different every time you build it and different if you build it on different machines with different configurations. So, everyone who grabs the binary instead of building from source is taking it on trust, just like proprietary software, that the program does what it claims.

Re:touch o' hyperbole (1, Informative)

Anonymous Coward | about a year ago | (#44063599)

This would be true if an executable binary was some kind of quantum black box, like the inside of a proton or whatever. In actual fact, a binary is a set of disassemble machine code and you can compare the differences between the version you compiled to the published version as you would like. The article writer found that usually the "difference" was a build timestamp, because duh.

Re:touch o' hyperbole (0)

idontgno (624372) | about a year ago | (#44063625)

OK, stupid.

Well, it was your idea.

Anyway, sometimes the binaries and the sources are downloaded together. This happens a lot in single-tarball releases, for instance.

But yeah, more often, if you don't want the binaries, you can certainly avoid downloading them, or unpackaging them and running them if you did download them.

Source-only tarballs, for instance, or just source packages.

Of course, you still have to run the binaries of your operating system. And your toolchain.

But no one would ever corrupt those [bell-labs.com] .

Hyperbole? (0)

Anonymous Coward | about a year ago | (#44063107)

"severely limiting the whole point of running free software"

Yet somehow we survive!

Incorrect suppositions. (5, Insightful)

Microlith (54737) | about a year ago | (#44063123)

A simple analysis shows that this is very hard in practice, severely limiting the whole point of running free software."

No it doesn't. The whole point of running free software is knowing that I can rebuild the binary (even if the end result isn't exactly the same) and, more importantly, freely modify it to suit my needs rather than being beholden to some vendor.

Re:Incorrect suppositions. (5, Insightful)

Shoten (260439) | about a year ago | (#44063375)

A simple analysis shows that this is very hard in practice, severely limiting the whole point of running free software."

No it doesn't. The whole point of running free software is knowing that I can rebuild the binary (even if the end result isn't exactly the same) and, more importantly, freely modify it to suit my needs rather than being beholden to some vendor.

There's another point too...which incidentally is the whole point of running a distro like Gentoo...that you can compile the binary exactly to your specifications, even sometimes optimizing it for your specific hardware. I don't get at all this idea he has about "reproducible builds;" if he builds the same way on the same hardware, he'll get the same binary. But what he's doing is comparing builds in distros with ones he did himself...and the odds that it's the same method used to create the binary are very low indeed.

If he's concerned about precompiled binaries having been tampered with, he's looking at the wrong protective measure. Hashes and/or signing are what is used to protect against that...not distributing the source code alongside the compiled binary files. If you look at the source code and just assume that a precompiled binary must somehow be the same code "just because," you're an idiot.

Re:Incorrect suppositions. (0)

Anonymous Coward | about a year ago | (#44063385)

I usually want the binary that I compiled to be different from the original. There may be a new x86 chip with a few nice op-codes, or a GPU chip that will do vector arithmetic ONLY IF I re-compile.

Sometimes, the package maintainer failed to build it right. There's a Python lxml connector library for RPM that can seg-fault using the SAX parser. Its GCC optimization-level doesn't match the binary "lxml.so", and linkage fails. I rebuilt it & now I can I have SAX rather than a big DOM memory footprint.

Re:Incorrect suppositions. (0)

Anonymous Coward | about a year ago | (#44063479)

yes, exactly.

how is it that the 'open source' movement got so fixated on shipping binaries instead of shipping an
environment that made it possible to work with the _source_

note that spending an hour trying to replicate the version for 'ifconfig' that i'm running with the exactly
path set, not guarenteed not to be built in the same way isn't really the same level of access as for example BSD
where I can always recompile from the exact source if I choose

Re:Incorrect suppositions. (0)

Anonymous Coward | about a year ago | (#44063627)

A simple analysis shows that this is very hard in practice, severely limiting the whole point of running free software."

No it doesn't. The whole point of running free software is knowing that I can rebuild the binary (even if the end result isn't exactly the same) and, more importantly, freely modify it to suit my needs rather than being beholden to some vendor.

+1. Just ask any business owner who would rather not "upgrade" (Ha!) to Windows 8.
Free software always lasts exactly as long as every tool should: Until it is no longer useful.

Not a concern (4, Insightful)

gweihir (88907) | about a year ago | (#44063141)

If you need to be sure, just compile it yourself. If you suspect foul play, you need to do a full analysis (assembler-level or at least decompiled) anyways.

The claim that this is a problem is completely bogus.

Re:Not a concern (2)

bunratty (545641) | about a year ago | (#44063243)

Diverse Double-Compiling by David A. Wheeler (5, Interesting)

tepples (727027) | about a year ago | (#44063317)

If you've compiled the compiler with competitors' compilers (try saying that ten times fast), you should be fairly safe from Trusting Trust [dwheeler.com] .

Re:Diverse Double-Compiling by David A. Wheeler (3, Funny)

bunratty (545641) | about a year ago | (#44063341)

But nuking it from orbit is the only way to be sure.

This is my darkest fear... (2)

erroneus (253617) | about a year ago | (#44063153)

It's a fair argument. If you are not compiling your binaries, how do you know what you have is compiled from the source you have available?

Truth? You don't. If you suspect something, you should investigate.

Re:This is my darkest fear... (1)

SirGarlon (845873) | about a year ago | (#44063301)

If you suspect something, you should investigate.

And on an open-source OS, you can.

Re:This is my darkest fear... (2)

jdunn14 (455930) | about a year ago | (#44063361)

Sorry to tell you, but Ken Thompson talked about you how you pretty much have to trust someone back in 1984: http://cm.bell-labs.com/who/ken/trust.html [bell-labs.com]

If no one else, you have to trust the compiler author isn't pulling a fast one on you....

Re:This is my darkest fear... (1)

Bearhouse (1034238) | about a year ago | (#44063373)

It's a fair argument. If you are not compiling your binaries, how do you know what you have is compiled from the source you have available?

Truth? You don't. If you suspect something, you should investigate.

You're right, of course. But that's not quite the (non) argument he was making, I think.
My understanding was that he wanted to check how easy it was to get the same result if compiled the public-available source and compared it to the objects.
Turns out that, due to datestamps etc. slightly different, but no biggie.

Anyway, in a production environment you should be compiling from source, since - security concerns aside - that's the only way to be sure you've got the correct source for your objects.

Not the whole point of free software. (0)

Anonymous Coward | about a year ago | (#44063163)

The point of free software isn't that you can know that a particular binary is from particular code.
The point is that you have the code available for inspection and that you can modify and build it yourself.
If your build behaves differently it will soon become clear that that binary is not the same.

Problems with verifying the binaries from source (5, Funny)

tooslickvan (1061814) | about a year ago | (#44063167)

I have recompiled all my software from the source code and verified that the binaries match but for some reason there's a Ken Thompson user that is always logged in. How did Ken Thompson get into my system and how do I get rid of him?

Re:Problems with verifying the binaries from sourc (1)

tepples (727027) | about a year ago | (#44063359)

I have recompiled all my software from the source code and verified that the binaries match

How many different compilers did you use? Did you try any cross-compilers, such as compilers on Linux/ARM that target Windows/x86 or vice versa?

How did Ken Thompson get into my system

See bunratty's comment [slashdot.org] .

and how do I get rid of him?

See replies to bunratty's comment.

Are You Sure This Is the Source Code? (2)

jedidiah (1196) | about a year ago | (#44063173)

> Are You Sure This Is the Source Code?

Yes. Yes I am sure. I built it myself. It even includes a few of my own personal tweaks. It does a couple of things that the normal binary version doesn't do at all.

Poor testing, waste of time (0)

Anonymous Coward | about a year ago | (#44063181)

Most distributions use mostly identical software, so chances are you end up with identical gcc and so are comparing identical behaviours. Not very useful.

FreeBSD now has a binary patch system and to that end someone worked out how to create binary diffs from freshly built packages against older ones. One of the major pitfalls is timestamps inserted by the compiler. Adjust for that and re-creating suddenly gets a lot more predictable.

Apparently this "tester" hasn't take a very close look as to what is really happening, but thought it more important to wax lyrically about his dreams then moan he couldn't make them reality.

Well, he hasn't really tried, I say. Consequently, his blogged moaning is a waste of time.

Some things wrong with TFA (3, Informative)

vikingpower (768921) | about a year ago | (#44063185)

1) Submitter is the one who wrote the blog post 2) No cross-reference, no references, no differing opinions at all 3) "severely limiting the whole point of running free software" is more than a bit of an exaggeration

Re:Some things wrong with TFA (1)

Mike Frett (2811077) | about a year ago | (#44063559)

I honestly don't understand the blog post, I'm not severely limited in any way. I somehow feel the user doesn't even know how to compile software and doesn't know anything about Open Source. It doesn't matter if the binary is the same, maybe his is compiled with different flags than mine or maybe I added a patch.

This honestly smells of someone out to discourage usage of Open Source.

Trust (5, Insightful)

bunratty (545641) | about a year ago | (#44063193)

I took a graduate-level security class from Alex Halderman (of Internet voting fame) and what I came away with is that security comes down to trust. To take an example, when I walk down the street, I want to stay safe and avoid being run over by a car. If I think that the world is full of crazy drivers, the only way to be safe is to lock myself inside. If I want to function in society, I have to trust that when I walk down the sidewalk that a driver will not veer off the road and hit me.

When you order a computer, you simply trust that it doesn't have a keylogger or "secret knock" CPU code installed at the factory. It's exactly the same with software binaries, of course. In the extreme case, even examining all the source code will not help [win.tue.nl] . You must trust!

Re:Trust (1)

jdunn14 (455930) | about a year ago | (#44063383)

So very true. In the end it all comes down to trust and as I posted above (before noticing yours) Thompson explained it extremely well.

Re:Trust (1)

saveferrousoxide (2566033) | about a year ago | (#44063633)

Maybe those just aren't good examples, but both have way more than simple trust involved. There's a huge disincentive to perpetrate either of those actions. In the case of a driver, there's car repairs, court costs, plus the downstream effects; running down a pedestrian, especially one on a sidewalk is a life altering action that no sane individual would perform just on a lark. In the case of an insecure computer, the company would be ruined if it came out that they were doing this to all the systems it sold, and targeting specific individuals would be prohibitively expensive.

No, the ones to worry about are those who have a reward that outweighs the risk. Voting is an excellent example of this.

Deterministic builds.. (3, Interesting)

0dugo0 (735093) | about a year ago | (#44063199)

..are a bitch. The amount of hoops eg. the bitcoin developers jump through to proof they didn't mess with the build are large. Running specific OS build in emulators with fake system time and whatnot. No easy task.

Why not just use a source based distro like Gentoo (1)

Anonymous Coward | about a year ago | (#44063203)

If this means that much to you, why not just use a source based distro like Gentoo (You can have the added bonus of it being tuned to your system)?

Augustine (-1)

Anonymous Coward | about a year ago | (#44063213)

Augustine:

throughout the whole order of things, brought this about. For if
when a man by haphazard opens the pages of some poet, who sang and
thought of something wholly different, a verse oftentimes fell out,
wondrously agreeable to the present business: it were not to be
wondered at, if out of the soul of man, unconscious what takes place
in it, by some higher instinct an answer should be given, by hap,
not by art, corresponding to the business and actions of the
demander."

God says...
will; not another, but itself. But it doth not command entirely,
therefore what it commandeth, is not. For were the will entire, it
would not even command it to be, because it would already be. It is
therefore no monstrousness partly to will, partly to nill, but a
disease of the mind, that it doth not wholly rise, by truth upborne,
borne down by custom. And therefore are there two wills, for that
one of them is not entire: and what the one lacketh, the other hath.

Logical Equivalency Checking (2)

RichMan (8097) | about a year ago | (#44063247)

I do IC design. Logical Equivalency Checking is well worn tool. You can futz about with the logic in a lot of different ways. LEC means we can do all sorts of optimization and still guarantee equivalent function. We can even move logic from cycle to cycle and have it checked that things are logically equivalent.

You run two compilers on the same source code you won't get the same code. You run two different versions of the compiler on the same code you wont' get the same code. You run the same compiler with different options you won't get the same code. They should however all be logically equivalent.

Re:Logical Equivalency Checking (0)

Anonymous Coward | about a year ago | (#44063523)

I do IC design. Logical Equivalency Checking is well worn tool. You can futz about with the logic in a lot of different ways. LEC means we can do all sorts of optimization and still guarantee equivalent function. We can even move logic from cycle to cycle and have it checked that things are logically equivalent.

You run two compilers on the same source code you won't get the same code. You run two different versions of the compiler on the same code you wont' get the same code. You run the same compiler with different options you won't get the same code. They should however all be logically equivalent.

Logical Equivalency Checking for software with unbounded memory is not possible (i.e., undecidable) because software with unbounded memory is Turing complete.

only if the code is 100% valid (1)

Chirs (87576) | about a year ago | (#44063535)

Depending on compiler options, some code that isn't completely valid (no overflow/underflow/etc.) can end up logically completely different when you turn on optimization.

Re:Logical Equivalency Checking (0)

Anonymous Coward | about a year ago | (#44063577)

guarantee equivalent function

Isn't that undecidable (in general)?

Yeah they're right! (0)

Anonymous Coward | about a year ago | (#44063257)

I thought I knew Slashdot's source code... then Boom! I find this:

meta http-equiv="refresh" content="600"

Compiler flags make this ridiculously nitpicky. (1)

Dputiger (561114) | about a year ago | (#44063333)

Unless I'm missing something pretty profound, even having the exact *source* won't always result in the exact binary. My understanding (and I could be wrong about this) is that you can take a well written program and plug it into multiple compilers. GCC may be one of the most popular options, but it's not the only one.

But compilers all optimize differently. GCC 3.x optimizes somewhat differently than GCC 4.x. You can tweak this behavior by manually setting compiler flags, or you can compile binaries that explicitly target different CPU architectures. A binary compiled to target all x86 processors may run differently on Haswell than a binary that's compiled specifically for Haswell.

In other words, flags set at compile time will change performance characteristics, even if the source code is identical, and while some projects may publish the exact details of every compiler flag they set, this doesn't seem to be the norm. Most projects I've seen say "Here are some binaries, and here's the source code if you want to play with it."

Clearly, the point of source code isn't to exactly duplicate every binary in every situation but to give you the data that goes *into* the compiler before the executable is compiled.

Or am I missing something?

Re:Compiler flags make this ridiculously nitpicky. (1)

Anonymous Coward | about a year ago | (#44063473)

Yes, you are. "A cherished characteristic of computers is their deterministic behaviour: software gives the same result for the same input. This makes it possible, in theory, to build binary packages from source packages that are bit for bit identical to the published binary packages. In practice however, building a binary package results in a different file each time. This is mostly due to timestamps stored in the builds. In packages built on OpenSUSE and Fedora differences are seen that are harder to explain. They may be due to any number of differences in the build environment. If these can be eliminated, the builds will be more predictable. Binary package would need to contain a description of the environment in which they were built.
Compiling software is resource intensive and it is valuable to have someone compile software for you. Unless it is possible to verify that compiled software corresponds to the source code it claims to correspond to, one has to trust the service that compiles the software. Based on a test with a simple package, tar, there is hope that with relatively minor changes to the build tools it is possible to make bit perfect builds."

Regulators (1)

Anonymous Coward | about a year ago | (#44063335)

I've dealt with a case where a regulatory authority must review code and perform the build to match compiled artifacts with distributed binaries in a (large, linux based) embedded system. You can do it if you have absolute control over the build environment.

Funny things come up when you start analyzing compiled or archived build output. I had to modify squashfs tools to prevent uninitialized superblock struct members from causing unreproducible file systems... there are unused members in the struct that just pick up whatever happens to be on the stack at the time and put it in the file archive. In another case I wrote a cpio archive normalizer to 'fix' things like the device major/minor number that gets recorded in the archive. Also, readdir(3) does not sort, which matters when making reproducible archives. There are GCC macros (__TIME__, for instance) that will embed a timestamp in an object file that can be trouble as well. Also, gzip has an undocumented flag (-m, i believe) to prevent it from sticking a timestamp in a compressed file.

Hexdump, diff and md5sum are your friends. It's possible to do this but you have to go deep.

err.. WHAT? (1)

magistrat (514371) | about a year ago | (#44063353)

err.. WHAT?

There is no problem; complete chain exists (3)

SuperBanana (662181) | about a year ago | (#44063415)

This a problem that doesn't exist. You establish a chain of evidence and authority for the binaries via signing and checksums, starting with the upstream. Upstream publishes source and there's signing of the announcement which contains checksums. Package maintainer compiles the source. The generated package includes checksums. Your repo's packages are signed by the repo's key.

You can, at any point in time with most packaging systems, verify that every single one of your installed binaries' checksums match the checksums of the binaries generated by the package maintainer.

If you don't trust the maintainer to not insert something evil, download the distro source package and compile it yourself.

If you suspect the distro source package, all you have to do is run a checksum of the copy of the upstream tarball vs the tarball inside the source package, and then all you need to do is review the patches the distro is applying.

If you suspect the upstream, you download it and spend the next year going through it. Good luck...

nothing new here, please move along... (1)

Nightshade (37114) | about a year ago | (#44063445)

Even if you have the source, it doesn't mean you can confirm what the binary is doing. See the classic "Trusting Trust" attack which is decades old. In my experience the most common reason for binaries that are not reproducible is due to build timestamps being embedded into the binary. For example, the ar command added the D flag in the past few years exactly for the purpose of being able to output reproducible results. (see the man page at http://linux.die.net/man/1/ar [die.net] ) It's true that reproducible binaries are probably a good thing from a security stand point, but in practice it can be a lot of work to make sure the build produces these. And even then, as Thompson showed, that doesn't always guarantee that what you see is what you get.

Tah da (2)

ElitistWhiner (79961) | about a year ago | (#44063581)

Finally, someone gets it. The backdoor is never where you're looking for it.

That's funny (0)

Anonymous Coward | about a year ago | (#44063589)

I thought the point of open source software (from the user end) is so that you can get it for free to do some trival task that you only need to do a few times where buying some comerical software would be a waste of money (or so you don't have to find cracks or keys for the comerical software).

Meh. (0)

Anonymous Coward | about a year ago | (#44063591)

> It should be possible to recreate the exact binary from the source code. A simple analysis shows that this is very hard in practice, severely limiting the whole point of running free software...

I want the binaries to be different, because my PC is a particular combination of parts.

Binaries being different are the whole point of having a single source code of Free Software app.

Alas, who cares about binaries? We're reaching a point where things are compiled just-in-time!

PS: All this is my personal opinion.

Required in some industries (5, Interesting)

mrr (506) | about a year ago | (#44063613)

I work in the gaming (Gambling) industry.

Many states require us to submit both the source code and build tools required to make an exact (and I mean 'same md5sum') copy of the binary that is running on a slot machine on the floor.. to an extent that would blow you away.

They need to be able to go to the floor of a casino, rip out the drive or card containing the software, take it back to THEIR office, and build another exact image of the same drive or SD card.

md5sum from /dev/sda and /dev/sdb must match.

I can tell you the amount of effort that goes into this is monumental. There can be no dynamically generated symbols at compile time. The files must be built compiled and written to disk exactly the same every time. The filesystem can't have modify or creation times because those would change.

This is a silly idea for open source software, the only industry I've seen apply it is perhaps the least-open one in the world.

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...