Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Distributed Compilation, a Programmer's Delight

kdawson posted more than 5 years ago | from the less-time-for-wacraft dept.

Programming 60

cyberpead writes in with a Developerworks article on the open source tool options that can help speed up your build process by distributing the process across multiple machines in a local area network.

Sorry! There are no comments related to the filter you selected.

What about Excuse #1? (4, Funny)

dsginter (104154) | more than 5 years ago | (#25761401)

Sorry - compiling [xkcd.com]

Re:What about Excuse #1? (1)

Andr T. (1006215) | more than 5 years ago | (#25761497)

If you get many slow machines and a slow network, it'll actually take longer to compile - and you'll still be able to happily say that.

Re:What about Excuse #1? (1)

debatem1 (1087307) | more than 5 years ago | (#25761687)

distcc scales pretty close to linearly. You'd have to have not-very-many slow machines and a slow network.

Re:What about Excuse #1? (1)

Andr T. (1006215) | more than 5 years ago | (#25761765)

Can you use the internet? Imagine using a p2p network... you compile the files and give the .obj back to the linker. This could take ages.

Re:What about Excuse #1? (3, Insightful)

Thiez (1281866) | more than 5 years ago | (#25763183)

That would allow for people to inject malware, wouldn't it?

To compile:

void printhello() {
  printf("Hello world!\n");
}

evil bastard changes to:

void printhello() {
  {
   
  }
  printf("Hello world\n");
}

Since the most practical way to spot the evil binary would be to compile the code yourself and compare, that sort of defeats the purpose of having someone else compile it. I guess you could have many random people compile the same piece of source-code and then compare all produced code, but that makes the whole thing rather complicated.
Also, the p2p thing would only be useful for open source, as I doubt it would be smart for people trying to produce some closed source product to send their source to a p2p network that may or may not store everything.
And this is all assuming the delays introduced by sending all this stuff over the internet are not so large that compiling locally is faster or almost as fast.

It's probably best to compile your stuff on your lan, on machines that are close, and that can be trusted.

Re:What about Excuse #1? (1)

nurb432 (527695) | more than 5 years ago | (#25766981)

If you are that paranoid only use a farm where you have control over all the machines.

Re:What about Excuse #1? (1)

heson (915298) | more than 5 years ago | (#25763021)

I like Icecream, dunno why it feels better than distcc, maybe the bugs are fewer.

Re:What about Excuse #1? (1)

networkconsultant (1224452) | more than 5 years ago | (#25761743)

So if I build 4 machines with quad cores and configure dist cc does that mean I`ll finally be able to build open office from source on gentoo?

Re:What about Excuse #1? (-1, Flamebait)

maxume (22995) | more than 5 years ago | (#25761883)

I thought gentoo died in a fire. Did this not happen?

Re:What about Excuse #1? (1)

cbreaker (561297) | more than 5 years ago | (#25762129)

Blah blah blah Gentoo blah blah compiling forever blah blah punchline.

When I first used Gentoo several years ago it was with a 950Mhz Athlon CPU and it wasn't too bad.

Now, with quad-core CPU's becomming the norm even in Desktop machines, the compiling thing is even less of an issue.

I was able to run software on Gentoo that I could never get to run well together on any other distribution. You can almost always get the latest and greatest versions of everything with Gentoo. With Kernels taking almost no time to compile these days, the source distribution excuse just keeps getting weaker and weaker.

The main problem with Gentoo was it's moving target when it came to core system tools. They'd upgrade the ebuild system and it could sometimes hose your system. I haven't used Gentoo in a couple years because of it, but I'd be willing to give it another shot eventually.

Re:What about Excuse #1? (2, Informative)

khellendros1984 (792761) | more than 5 years ago | (#25762313)

That was my problem. Broken ebuilds. Conflicting requirement lists that the updater script wasn't any good at working out. Gentoo made me run back to Slackware for a while, and eventually to Ubuntu (about 2-3 years ago, to see what the buzz was about).

Re:What about Excuse #1? (1)

networkconsultant (1224452) | more than 5 years ago | (#25762403)

Abviously you`ve never tried to compile Open Office. ;) (Even on a powerful system it takes days. ).

Re:What about Excuse #1? (2, Insightful)

ORBAT (1050226) | more than 5 years ago | (#25763739)

If by powerful system you mean steam-powered Analytical Engine, yes, it'll take days.

The longest OpenOffice compile I've ever done was something around 5 hours, and that was with the system doing other stuff on the side. Distcc et al reduce the compile time to around 2h.

Re:What about Excuse #1? (1)

cbreaker (561297) | more than 5 years ago | (#25768747)

Well, not days, hours. Historically, Gentoo has provided some binary packages for software that takes an undue amount of time to compile and won't affect the rest of the system if compiled with generic options.

If I recall correctly, OpenOffice was one such package.

Gentoo isn't that masochistic.

Re:What about Excuse #1? (1)

Joe_Sextus (1363847) | more than 5 years ago | (#25762977)

I compiled Open Office 3 last Monday on a 2.0 GHz Core 2 Duo with 1 GB of RAM and no swap in 2 hours and 45 minutes. That's my best time yet.

Dilbert version is funnier (1)

istartedi (132515) | more than 5 years ago | (#25768383)

Tried to find it, but couldn't. It goes something like this:

Panel 1: PHB, walking by Dilbert's cube: Dilbert, why aren't you working?

Panel 2: Dilbert: My programs are compiling.

Panel 3: PHB, sitting back at his desk by himself, thought bubble: I wonder if my programs compile.

Re:What about Excuse #1? (1)

jerep (794296) | more than 5 years ago | (#25770565)

Compile times? I'm using D you insensitive clod.

Bulk building is more effective (3, Informative)

TheThiefMaster (992038) | more than 5 years ago | (#25761431)

Due to a strange quirk in the way compilers are designed, it's (MUCH) faster to build a dozen files that include every file in your project than to build thousands of files.

Once build times are down to 5 - 15 minutes you don't need distributed compiling. The link step is typically the most expensive anyway, so distributed compiling doesn't get you much.

Re:Bulk building is more effective (0)

Anonymous Coward | more than 5 years ago | (#25761703)

From my experience, the link step is way less expensive them the actual compilation. It takes me ~10 minutes to compile all my source files and less than a minute to perform the linking.

Re:Bulk building is more effective (0)

Anonymous Coward | more than 5 years ago | (#25761853)

you are not the typical case

Re:Bulk building is more effective (1)

loonycyborg (1262242) | more than 5 years ago | (#25761935)

Linking is much slower with debug symbols enabled than without, but still not slow enough to be significant.

Re:Bulk building is more effective (1)

TheThiefMaster (992038) | more than 5 years ago | (#25762017)

Or link-time code generation if you have that level of optimization turned on.

Preprocessing in C (4, Informative)

Frans Faase (648933) | more than 5 years ago | (#25762335)

I guess you are refering to the preprocessing step of C and C++ compilers, which was really a lame hack, I think. If you have a lot of include files, preprocessing produces large intermediate files, which contain a lot of overlapping code, that has to be compiled over and over again.

Preprocessing should have been removed a long time ago, but nasty backwards compatability issue, it was never done. Other languages, such as Java and D, solve this problem in a much better way. Just as did TurboPascal with its TPU files in the late 1980's.

Re:Preprocessing in C (0)

Anonymous Coward | more than 5 years ago | (#25762967)

Actually, it's more about disk I/O than strictly preprocessing (though re-doing preprocessing over and over is awfully inefficient also).

Imagine you have 500 CPP Files and they all include a tree of 10 - 100 header files.

Compiling each CPP in turn requires 10 - 100 files read off the disk each time.

Batching all the .CPPs in to a single file which also includes all 10 - 100 header files is orders of magnitude fewer reads and thus faster.

I imagine there is probably a limit to how large of a project you can do this on (though we haven't hit it and we're 8k+ .CPPs and .Hs)

Re:Preprocessing in C (1)

Charan (563851) | more than 5 years ago | (#25763069)

Compiling each CPP in turn requires 10 - 100 files read off the disk each time.

Modern operating systems get around this issue with a disk cache. In reality, 100 files will be read off the disk for the first compile, and the rest of the compiles will just access the cached copy in memory (unless memory is in short supply on your system).

Re:Preprocessing in C (2, Informative)

Anonymous Coward | more than 5 years ago | (#25763171)

It's not sufficient for large projects; disk I/O is still a very large overhead when compiling. Switching to a 'Unity' build scheme reduced compile times significantly (more-so than the distributed compile solution we used since it still had to read the files off disk multiple times in addition to sending them over the wire to multiple machines). .CPPs and .Hs make up about 110mb on our project.

Re:Preprocessing in C (0)

Anonymous Coward | more than 5 years ago | (#25763497)

Addendum: When we were doing our profiling, that 110mb transferred in to 10's of GB of data being read from and written to the disk in a single compile.

Re:Bulk building is more effective (1)

swillden (191260) | more than 5 years ago | (#25766599)

Due to a strange quirk in the way compilers are designed, it's (MUCH) faster to build a dozen files that include every file in your project than to build thousands of files.

True of Visual C++, not true of any other compiler I'm familiar with.

Re:Bulk building is more effective (1)

jgrahn (181062) | more than 5 years ago | (#25789249)

Due to a strange quirk in the way compilers are designed, it's (MUCH) faster to build a dozen files that include every file in your project than to build thousands of files. Once build times are down to 5 - 15 minutes you don't need distributed compiling.

But your code will be harder to understand. You're giving up a lot of tools, like static globals in C and anonymous namespaces in C++.

Every time I have encountered painfully long compile times, the cause has been sloppiness. Usually, the direct cause is that noone has maintained the Makefile, so there's an untrusted heap of recursive Makefiles which only work if you do a "make clean" first. And which don't work with Gnu make's -j flag.

Throwing hardware at such a problem would feel like running Bubble Sort on a supercomputer.

The link step is typically the most expensive anyway, so distributed compiling doesn't get you much.

What linker are you using, and how do you use it? For Gnu ld, the work seems to be mostly file I/O. I have seen one insane linker which took 15 minutes and 300 MB of virtual memory to link something which ended up a few megabytes of object code, but I hope that's not typical.

Very Cool (1)

dintech (998802) | more than 5 years ago | (#25761433)

Imagine a beowolf cluster for those.

OMFG!!!!111one (-1, Troll)

Anonymous Coward | more than 5 years ago | (#25761565)

I need a towel. Thanks.

Is this new? (2, Insightful)

daveewart (66895) | more than 5 years ago | (#25761615)

Article summary: use 'make -j', 'distcc' and 'ccache' or something combination of these. These utilities are well known and widely used already, no?

Re:Is this new? (1)

cbreaker (561297) | more than 5 years ago | (#25762199)

Yea. I used to use distcc a lot about five years ago. It doesn't help with all compile functions but it can help if you're compiling something big like X11 or KDE.

Re:Is this new? (0)

Anonymous Coward | more than 5 years ago | (#25765913)

No, we're still discovering it. And worse recreating much of it since we don't know it already exists.

Re:Is this new? (0)

Anonymous Coward | more than 5 years ago | (#25770543)

If you want something new, look at distmake [sourceforge.net] .

It's Gnu Make, extended to run jobs on multiple hosts. It's very fast.

Re:Is this new? (1)

Slashdot Parent (995749) | more than 5 years ago | (#25770941)

Yeah, I was wondering the same thing. distcc and ccache has been a staple of Gentoo users since forever.

Minor error (4, Informative)

pipatron (966506) | more than 5 years ago | (#25761697)

There's a minor error in the article, which claims that your servers need access to the source. distcc was designed to not need this.

Re:Minor error (2, Insightful)

SleptThroughClass (1127287) | more than 5 years ago | (#25761977)

There's a minor error in the article, which claims that your servers need access to the source. distcc was designed to not need this.

That implies you read the article, but that can't be the case.

Re:Minor error (2, Informative)

cbreaker (561297) | more than 5 years ago | (#25762281)

I read it too, and it's true - they DO say all of the machines need access to the source which they do not.

Maybe there's some special cases, but I've never had to have a shared source repository in order to use distcc.

They also say the machines need to be exactly the same configuration, and they do elaborate on that a little bit, but it's not strictly true. Depending on the source you're compiling, you might only need to just have the same major version of GCC.

Re:Minor error (0)

Anonymous Coward | more than 5 years ago | (#25762493)

False - distcc does require the same version of the OS. I've tried using distcc to spread compilation of the kernel across a ubuntu & redhat machine that I have (both x64), and it refused.

It depends upon how the OS identifies itself. Also, I'd recommend using the avahi version of distcc (Fedora 9 comes with it - I believe it finally made it into Intrepid) which allows for automatically detecting servers on your network.

Re:Minor error (2, Informative)

pipatron (966506) | more than 5 years ago | (#25763055)

False - I regularly use windows as a host for example, running distcc in cygwin. The only thing you have to make sure is that the compiler called by distcc will create object files for the client system. You can have a renderfarm of sparcs generating code for your ARM router if you like.

Re:Minor error (1)

DogAlmity (664209) | more than 5 years ago | (#25763097)

False False - distcc does not require the same version of the OS. I spread compilation across four machines. The host was Gentoo, one of the other machines was Gentoo, and the other two were machines I just stuck the latest Knoppix CD in.

Gentoo machines were using gcc 4.1.x i586, Knoppix had gcc 4.1.x i386. All 32-bit. The resulting build was lightning fast and error-free. This was an app not the kernel.

Re:Minor error (0)

Anonymous Coward | more than 5 years ago | (#25763213)

False - distcc doesn't give a fuck what OS you're running it on. Your OS, on the contrary, may.

Re:Minor error (2, Insightful)

cbreaker (561297) | more than 5 years ago | (#25768785)

"False - distcc does require the same version of the OS. I've tried using distcc to spread compilation of the kernel across a ubuntu & redhat machine that I have (both x64), and it refused."

You didn't read my post or you have low comprehension skills. I said "Depending on the source you're compiling."

A Kernel build might require specific libraries to be the same version. Building Firefox might not. Some apps you can build on Linux and use a cygwin box running distcc to help. Others you cannot.

It's not a RULE. It depends the the source. That's what I said. Then again you might have actually realized that and that's why you posted AC.

In other news (4, Funny)

adonoman (624929) | more than 5 years ago | (#25761737)

Slashdot readership plummets to an all-time low as programmers actually have to work.

Re:In other news (1)

tcopeland (32225) | more than 5 years ago | (#25761815)

> Slashdot readership plummets to an all-time low as programmers actually have to work.
Not at Sun [cnn.com] , though... yikes.

Re:In other news (1)

adonoman (624929) | more than 5 years ago | (#25762803)

Ouch!

raggle fraggle (2, Funny)

TheRealMindChild (743925) | more than 5 years ago | (#25761829)

Sky rockets in flight... distcc delight......
distcc deliiiiiiiight.

Re:raggle fraggle (1)

i.r.id10t (595143) | more than 5 years ago | (#25761945)

Thought thief!

distcc has one fatal flaw (0)

Anonymous Coward | more than 5 years ago | (#25762173)

It requires all preprocessing (-E option to gcc) to be done on one machine. This can frequently create such a big bottle neck (especially on C++ code) that it swamps any gain from the distributed obj compiling. The preprocessing takes so long that the rest of the machines are idle and waiting for jobs and you are also sending huge text files across the network. I know distcc supports compressing these files, but even that is done serially on the initial compile machine.
distcc also is very flawed in a multiuser environment when it comes to fair distribution of jobs. You typically end up with one machine bogged down and the compiles actually end up taking longer than just doing everything locally.

Re:distcc has one fatal flaw (3, Informative)

TeknoHog (164938) | more than 5 years ago | (#25762331)

From man distcc:

In pump mode, distcc runs the preprocessor remotely too. To do so, the preprocessor must have access to all the files that it would have accessed if had been running locally. In pump mode, therefore, distcc gathers all of the recursively included headers, except the ones that are default system headers, and sends them along with the source file to the compilation server.

Wrong stuff... (0)

Anonymous Coward | more than 5 years ago | (#25762359)

The article says each machine involved in a distcc compile must have access to all source files. This is not true: distcc runs the pre-processing stage, where all the header files are included, before sending the processed file to the machine(s) for compilation. They do not need the source and not even the includes/libs the software links against. All that happens on the central machine.

We can't take TFA's author's adivce! (1)

idontgno (624372) | more than 5 years ago | (#25762405)

He's using TCSH! That's BAD FOR YOU! [faqs.org]

Ok, enough offtopic. This is actually pretty cool, considering our development environment is clusters and clusters of IBM P-Series LPARS, and our codebase is (A) disgustingly huge, and (B) actually pretty amenable to parallelized make.

FINALLY, I can justify to my boss that browsing /. is research! (Now if I could just make a good case for 4chan...)

SMEs (2)

hachete (473378) | more than 5 years ago | (#25762629)

The reason for a lot of build machines in the rack may not be horsepower but rather you need x different machine versions, or a certain build only builds on a certain machine because of licence restrictions or you may only have one windows box with the Japanese character set installed because it causes so many problems that multiplying the problems just isn't worth it and so on and so forth. Building across n number of the same machine version just isn't worth the work IMO. Just get a bigger machine and save on the machine maintenance.

So the real benefit of distcc might be parallel compilation; I see a big future for this, particularly with the chipsets becoming commonplace. Once upon a time, I would not countenance a dual-chip machine in the rack because of the indeterminate mayhem it would sometimes cause to a random piece of code deep in the bowels. Those problems are well gone.

Umm. I wonder how this plays out how with VMWARE? A distributed compiler smart enough to use the (correct) local compiler across a varied build set would be worth having ...

WFC (0)

Anonymous Coward | more than 5 years ago | (#25762699)

...snooze...

No distcc hints... (1)

GenP (686381) | more than 5 years ago | (#25763469)

Dang, no info on creating uniform toolchains for each distcc arch. IncrediBuild at work is really good about that, though it has the distinct advantage of being able to just shoot a single executable over the wire if the remote end needs it.

Icecream. (2, Interesting)

Sir_Lewk (967686) | more than 5 years ago | (#25763961)

If you are interested in distributed compiling, you may want to check out icecream. http://en.opensuse.org/Icecream [opensuse.org]

It's similar to distcc, but with some notable benefits.

'native' breaks distcc (1)

Progman3K (515744) | more than 5 years ago | (#25765207)

If you want to distribute compilations, you must not use the 'native' gcc option, it will cause the compiler instances to emit objects in the native format of the compiler invoked and your compilation hosts may not all be identical.

Only on Apollo (0)

Anonymous Coward | more than 5 years ago | (#25765757)

The only place this ever worked well, and transparently, was when using DSEE on Apollo.

But then, it could also reuse .o files already in place from some other developer's compilation. It was rigorous enough to know the dependencies leading to that .o were identical, including environment issues.

I've found a better solution altogether (1)

Tablizer (95088) | more than 5 years ago | (#25769375)

It's called an "interpreter".

(Cue flamewar in 3...2...1...
     

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?