Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Auto-Parallelizing Compiler From Codeplay

ScuttleMonkey posted more than 7 years ago | from the code-writes-you dept.

147

Max Romantschuk writes "Parallelization of code can be a very tricky thing. We've all heard of the challenges with Cell, and with dual and quad core processors this is becoming an ever more important issue to deal with. The Inquirer writes about a new auto-parallelizing compiler called Sieve from Codeplay: 'What Sieve is is a C++ compiler that will take a section of code and parallelize it for you with a minimum hassle. All you really need to do is take the code you want to run across multiple CPUs and put beginning and end tags on the parts you want to run in parallel.' There is more info on Sieve available on Codeplay's site."

cancel ×

147 comments

Sorry! There are no comments related to the filter you selected.

so.. (-1, Troll)

mastershake_phd (1050150) | more than 7 years ago | (#18297070)

So this makes you numb from the waist down?

(my apologies to the handicapped people out there)

Re:so.. (1)

maxume (22995) | more than 7 years ago | (#18297256)

I'm impressed. The website you are pimping by posting dozens of inane comments to /. doesn't even have any ads on it.

Re:so.. (-1, Troll)

mastershake_phd (1050150) | more than 7 years ago | (#18297276)

They aren't all inane....

Re:so.. (-1, Troll)

Anonymous Coward | more than 7 years ago | (#18297352)

No, but this does [encycloped...matica.com] !

Reentrant? (5, Interesting)

Psychotria (953670) | more than 7 years ago | (#18297084)

Forgive me if I'm wrong (I've not coded parallel things before), but if the code is re-entrant, does this go a long way towards running the code in parallel? Obviously there are other factors involved here, like addressing memory, but this is thought of in re-entrant programming. I'm not sure what the difference is... please enlighten me :-)

Re:Reentrant? (5, Informative)

Anonymous Coward | more than 7 years ago | (#18297186)

Reentrancy is a factor, because it's a class of dependencies, but there are many other dependencies.

Consider a for loop: for (int i=0; i100; i++)doSomething(i);

Can this be parallelized? Perhaps the author meant it like it's written there: First doSomething(0), then doSomething(1), then ... Or maybe he doesn't care about the order and doSomething just needs to run once for each i in 0..99. The art of automatic parallelization is to find overspecifications like the ordered loop where order isn't really necessary. If nothing in doSomething depends on the outcome of doSomething with a different i, they can be run in parallel and in any order. Suppose each doSomething involves a lengthy calculation and an output at the end. Then they can't simply run in parallel, because the output is a dependency: As written, the output from doSomething(0) comes before doSomething(1) and so on. But the compiler could still run the lengthy calculation in parallel and synchronize only the fast output at the end. The more of these opportunities for parallelism the compiler can find, the better it is.

Re:Reentrant? (4, Interesting)

644bd346996 (1012333) | more than 7 years ago | (#18297288)

In the case of the for loop, that is really a symptom of the fact that c-style languages don't have syntax for saying "do this to each of these". So one must manually iterate over the elements. Java does have the for-each syntax, but it is just an abbreviation of the "for i from 0 to x" loop.

Practically all for loops written are independent of order, so they could be trivially implemented using MapReduce. That one change would parallelize a lot of code, with no tricky compiler optimizations.

Re:Reentrant? (1)

mrchaotica (681592) | more than 7 years ago | (#18297528)

Interesting! I'd never heard of "map and reduce" before, but it seems like just the sort of idiom that would be useful for the program I'm getting ready to write.

Would it make sense to have an interface for it like this in C:

void map(int (*op)(void*), void** data, int len);

Where the implementation could be either a for loop:

void map(int (*op)(void*), void** data, int len)
{
int i;
for(i = 0; i < len; i++) {
(*op)(data[i]);
}
}

or an actual parallel implementation, such as one using pthreads or running on a GPU?

Re:Reentrant? (3, Informative)

prencher (971087) | more than 7 years ago | (#18297758)

Re:Reentrant? (1)

mrchaotica (681592) | more than 7 years ago | (#18297852)

Well, I meant in the more general sense (i.e., your mention of "MapReduce" eventually lead me to find this [wikipedia.org] ).

Re:Reentrant? (1)

MillionthMonkey (240664) | more than 7 years ago | (#18297982)

Java does have the for-each syntax, but it is just an abbreviation of the "for i from 0 to x" loop.

I would be very surprised if nobody's tried to work around this problem with AOP and annotations. It would be trivial to code parallelization with a method interceptor.

On a multiprocessor machine the JVM will assign threads to individual CPUs. Your interceptor would populate an array of N threads, assign each thread a number, wrap the annotated method in a callback with a finally {lock.notify()} at the end, synchronize on the lock, start all worker threads, and wait on the lock N times.

The client programmer would annotate a method this way:

@parallelizable(threadIndex = "processorNum")
public void doSomething(int processorNum, Object otherParameter, int etc) {...}


The value of the named int parameter would then be replaced with the thread index by the interceptor, and doSomething() would query that parameter and execute its portion of the for loop.

Or you could apply data parallelization, like parallel Fortran, that supported a meta language in the comments. You'd declare an array and then annotate each dimension with compiler directives like *, BLOCK, or SCATTER. BLOCK split the array in contiguous chunks across N CPUs, SCATTER assigned elements to CPUs so that element i went to the (i mod N) -th CPU, and * was for array dimensions that you didn't want to distribute. So instead of passing down a thread index the annotation would apply to arrays and array dimensions, or preferentially to list collections (to avoid having to copy an array into N subarrays):

@parallelizable(array-names="politicians, donors, amounts, votes" array-distributions="[BLOCK], [SCATTER], [BLOCK,SCATTER], [*,BLOCK]")
public void findPatterns(List politicians, Donor[] donors, float[][] amounts, List> votes) {...}


But this is all really ugly. It would have been so much easier if the language were just designed properly.

Re:Reentrant? (1)

gerddie (173963) | more than 7 years ago | (#18298654)

"Practically all for loops written are independent of order"

I risk to differ:

      istream datafile("datafile.txt", "r");
      vector<float> val(N);
      for (size_t i = 0; i < N-1; ++i) {
              datafile >> val[i];
              if (!datafile.good())
                  throw fileread_exception("datafile.txt");
      }

      for (size_t i = 0; i < N-1; ++i)
                val[i] = val[i] - val[i+1];

Re:Reentrant? (5, Informative)

jd (1658) | more than 7 years ago | (#18297560)

Simple version: Parallel code need not be re-entrant, but all re-entrant code is parallel.

More complex version: There are four ways to run a program. These are "Single Instruction, Single Data" (ie: a single-threaded program), "Single Instruction, Multi Data" (SETI@Home would be an example of this), "Multi Instruction, Single Data" (a good way to program genetic algorithms) and "Multi Instruction, Multi Data" (traditional, hard-core parallelism).

SIMD would need to be re-entrant to be parallel, otherwise you can't be running the same instructions. (Duh. :) SIMD is fashionable, but is limited to those cases where you are operating on the data in parallel. If you want to experiment with dynamic methods (herustics, genetic algorithms, self-learning networks) or where you want to apply multiple algorithms to the same data (eg: data-mining, using a range of specialist algorithms), then you're going to be running a vast number of completely different routines that may have no components in common. If so, you wouldn't care if they were re-entrant or not.

In practice, you're likely to use a blend of SIMD, MISD and MIMD in any "real-world" program. People who write "pure" code of one type or another usually end up with something that is ugly, hard to maintain and feels wrong for the problem. On the other hand, it usually requires the fewest messaging and other communication libraries, as you're only doing one type of communication. You can also optimize the hell out of the network, which is very likely to saturate with many problems.

off topic (-1, Offtopic)

Anonymous Coward | more than 7 years ago | (#18297086)

not registering just to fp

FPP (5, Funny)

DigitAl56K (805623) | more than 7 years ago | (#18297096)

Frtprallps
is arle ot

Re:FPP (0)

Anonymous Coward | more than 7 years ago | (#18297410)

please, remove this shadow over me and show me the meaning of this, since obviously I am not as smart as you (or the other people who said it was funny

Re:FPP (2, Informative)

MarkRose (820682) | more than 7 years ago | (#18297450)

"first parallel post"

Re:FPP (1)

thrawn_aj (1073100) | more than 7 years ago | (#18297564)

"first parallel post"
Oh noes DigitAl56K, how did he crack your code? :P Probably thinks in parallel himself =D.

Re:FPP (1)

Z0mb1eman (629653) | more than 7 years ago | (#18297616)

It's funny to anyone who's ever tried debugging a multithreaded app using output statements. :p

(I'm explaining someone else's programming joke on Slashdot... I've reached a new low).

Sad... (2, Funny)

jd (1658) | more than 7 years ago | (#18297588)

I mean, it was only running on two threads AND showed clear signs of excess barrier operations at the end of every character. From here on out, I expect first parallel posts to run over at least four threads and not be sequentially-coherent. The world is moving towards async! Don't let first posts suffer with past limitations!

Re:FPP (0, Troll)

dascandy (869781) | more than 7 years ago | (#18298462)

Frtprle ot
is aallps

You spelling-clot.

This is Awesome (5, Funny)

baldass_newbie (136609) | more than 7 years ago | (#18297110)

I loved 'Clocks'. Oh wait, Codeplay...not Coldplay.
Nevermind.

Oh look. A duck.

Re:This is Awesome (0)

NotQuiteReal (608241) | more than 7 years ago | (#18297208)

You have confused Auto-Parallelizing Compiler From Codeplay

For
Auto-Compiling Code from ParallelPlay
or
Auto-Play Parallelizer from Complay
or something...

Re:This is Awesome (0, Redundant)

Criminally Insane Ro (982870) | more than 7 years ago | (#18297590)

it's like html code for c++

Re:This is Awesome (-1, Troll)

Anonymous Coward | more than 7 years ago | (#18298046)

Do you know Why I know your gay...

openMP (2, Informative)

Anonymous Coward | more than 7 years ago | (#18297126)

and what the difference between this and openMP ?

Re:openMP (1)

compact_support (968176) | more than 7 years ago | (#18297218)

Just a vendor's incompatible me-too implementation. I'm sure there are some semantic differences and maybe some new features, but it's the same thing. This product may also be aimed more at multicore desktops than SMP big iron like openmp is. I'm partial to MPI (Specifically, OcamlMPI) myself. 20 cores at 4.3 cpu-days and it was trivial to achieve >99% CPU utilization on all nodes.

Re:openMP (1)

init100 (915886) | more than 7 years ago | (#18299134)

This product may also be aimed more at multicore desktops than SMP big iron like openmp is.

And the difference would be?

Re:openMP (2, Informative)

ioshhdflwuegfh (1067182) | more than 7 years ago | (#18298712)

and what the difference between this and openMP ?
On the page 7 of The Codeplay Sieve C++ Parallel Programming System, 2006 [codeplay.com] you'll find section that describes "advantages" of codeplay over openmp, but nothing terribly exciting. Codeplay allows you indeed to better automatize parallelization but is at the same time also limited to a narrower set of optimizations compared to openmp.

Hey! Let's reinvent OpenMP! (0, Redundant)

Anonymous Coward | more than 7 years ago | (#18297184)

And call it "automatic" while we're at it.

Shouldn't this be at http://ads.slashdot.org/ [slashdot.org] instead of http://it.slashdot.org [slashdot.org] ?

Re:Hey! Let's reinvent OpenMP! (1)

Duncan3 (10537) | more than 7 years ago | (#18297200)

You got it.

Re:Hey! Let's reinvent OpenMP! (2, Interesting)

grub (11606) | more than 7 years ago | (#18297424)


Our SGI compilers at work come with an -apo (automatic parallization optimization) command line option. That one option cost us a pretty penny. It's nice to see other people getting in on the action.

Snippet from the manpage, highlighting is mine:

-apo, -apokeep, -apolist
For -n32 and -64, it invokes the Auto-Parallelizing Option
(APO), which automatically converts sequential code into
parallel code by inserting parallel directives where it is
safe and beneficial to do so. Specifying -apo also sets the
-mp option. Both -apokeep and -apolist produce a listing
file, file.list. Specifying -apokeep retains file.anl and
file.m, which can be used by the parallel analyzer, ProDev
ProMP (see the EXAMPLES section). When the -IPA option is
specified with -apokeep, the default settings for IPA
suboptions are used with the exception of -IPA:inline, which
is set to OFF.
APO is invoked only if you are licensed for it. For licensing
information, see your sales representative.

For more information on APO, its directives, and command-line
options, see MIPSpro C and C++ Pragmas.

When specifying the -o32 option on the cc command line, -apo
invokes the IRIS Power C analyzer (PCA). See the -pca option
description.

Re:Hey! Let's reinvent OpenMP! (-1, Offtopic)

Anonymous Coward | more than 7 years ago | (#18298540)

That one option cost us a pretty penny.

That was apparently not enough pretty pennies to keep SGI afloat.

Interesting, but.. (5, Insightful)

DigitAl56K (805623) | more than 7 years ago | (#18297198)

The compiler will put out code for x86, Ageia PhysX and Cell/PS3. There were three tests talked about today, CRC, Julia Ray Tracing and Matrix Multiply. All were run on 8 cores (2S Xeon 5300 CPUs) and showed 739, 789 and 660% speedups respectively.

That's great - but do the algorithms involved here naturally lend themselves to the parallelization techniques the compiler uses? Are there algorithms that are very poor choices for parallelization? For example, can you effectively parallelize a sort? Wouldn't each thread have to avoid exchanging data elements any other thread was working on, and therefore cause massive synchronization issues? A solution might be to divide the data set by the number of threads and then after each set was sorted merge them in order - but that requires more code tweaking than the summary implies. So I wonder how different this is from Open/MT?

Re:Interesting, but.. (1)

DigitAl56K (805623) | more than 7 years ago | (#18297206)

Gah! "OpenMP".

Re:Interesting, but.. (5, Interesting)

Anonymous Coward | more than 7 years ago | (#18297306)

or example, can you effectively parallelize a sort? Wouldn't each thread have to avoid exchanging data elements any other thread was working on, and therefore cause massive synchronization issues?

Yes you can, take a look a Merge sort [wikipedia.org] (or quick sort, same idea). You split up the large data set into smaller ones, sort those and recombine. That's perfect for parallization -- you just need a mechanism for passing out the orginal elements and then recombining them.

So if you had to sort 1B elements maybe you get 100 computers and give them each 1/100th of the data set. THat's manageable for one computer to sort easily. THen just develop a service that hands you the next element from each machine, and you pull off the lowest one.

Re:Interesting, but.. (0)

Anonymous Coward | more than 7 years ago | (#18297494)

How redundant. You just gave exactly the same solution as the post you replied to.

Re:Interesting, but.. (3, Interesting)

DigitAl56K (805623) | more than 7 years ago | (#18297542)

If you read my post, this is exactly what I suggested. The actual point was that it requires more than simply putting "beginning and end tags" on the code, e.g. it is not automatic.

I would also ask this of CodePlay: If your compiler is automatic, why do we need to add beginning and end tags? :)

Re:Interesting, but.. (1)

maxwell demon (590494) | more than 7 years ago | (#18298906)

Is there any modern programming language which doesn't provide a sort function in its standard library? Because if you use that, the vendor can simply provide a parallelized version, and you don't have to compare if the vendor parallelized that function manually, or the compiler parallelized it automatically, of even a mixture of both.

Re:Interesting, but.. (1)

maxwell demon (590494) | more than 7 years ago | (#18298928)

s/compare/care/

Re:Interesting, but.. (1)

ioshhdflwuegfh (1067182) | more than 7 years ago | (#18298730)

do the algorithms involved here naturally lend themselves to the parallelization techniques the compiler uses?
What do you mean?

Are there algorithms that are very poor choices for parallelization?
yes, finite state machines.

snake oil (4, Insightful)

oohshiny (998054) | more than 7 years ago | (#18297216)

I think anybody who is claiming to get decent automatic parallelization out of C/C++ is selling snake oil. Even if a strict reading of the C/C++ standard ends up letting you do something useful, in my experience, real C/C++ programmers make so many assumptions that you can't parallelize their programs without breaking them.

Re:snake oil (1)

TubeSteak (669689) | more than 7 years ago | (#18297346)

"All you really need to do is take the code you want to run across multiple CPUs and put beginning and end tags on the parts you want to run in parallel"

The compiler isn't going to know if you're doing something stupid or not.
In other words: use at your own risk.

The old adage of "garbage in, garbage out" still applies.

Re:snake oil (2, Insightful)

mastershake_phd (1050150) | more than 7 years ago | (#18297448)

"All you really need to do is take the code you want to run across multiple CPUs and put beginning and end tags on the parts you want to run in parallel" The compiler isn't going to know if you're doing something stupid or not. In other words: use at your own risk. The old adage of "garbage in, garbage out" still applies.

But how are you supposed to know exactly how something is going to run under this? Even with a good understanding of what your trying to do and (hopefully) what exactly the compiler is doing you still might get some weird results under certain situations. It might work buts its still going to take lots of trial and error, or at least a lot of verification.

Re:snake oil (1)

maxwell demon (590494) | more than 7 years ago | (#18298990)

But how are you supposed to know exactly how something is going to run under this?

The semantics of that construct is well-defined.

Of course from the short description it's not entirely clear to me if the compiler actually implements that semantics, or simply relies on you to honor it (e.g. is it possible to call a non-sieve function from within a sieve function or block? In that case, the compiler cannot reasonably implement the semantics). There's a precedent for the second type of semantics: restrict.

But if the semantics is enforced, you know exactly what you get.

Re:snake oil (2, Informative)

ariels (6608) | more than 7 years ago | (#18298264)

TFA specifically mentions that you need to mark up your code with sieves:
  1. A sieve is defined as a block of code
    contained within a sieve {} marker and
    any functions that are marked with sieve.
  2. Inside a sieve, all side-effects are delayed
    until the end of the sieve.
  3. Side effects are defined as modifications
    of data that are declared outside the
    sieve
The compiler can use this information to decide what parts of the code can safely be parallelized. Adding the "sieve" keyword can change the semantics of the code, adding it correctly is your responsibility.
Not sure I find the particular concept appealing for programming -- just trying to straighten out the claim of the article.

Prefer OpenMP (5, Informative)

drerwk (695572) | more than 7 years ago | (#18297238)

I have some small amount of experience with OpenMP http://openmp.org/ [openmp.org] , which allows one to modify C++ or Fortran code using pragmas to direct the compiler regarding parallelization of the code. And the Codeplay white paper made this sound much like it implements one of the dozen or so OpenMP patterns. I am fairly skeptical that Codeplay has any advantage over OpenMP, but the white paper lists some purported advantages. I will not copy them here and take the fun out of reading them for yourself. I will list OpenMP advantages.
1: OpenMP is supported by Sun, Intel, IBM, $MS(?) etc, and implemented in gcc 4.2.
2: OpenMP has been used successfully for about 10 years now, and is on a 2.5 release of the SPEC.
3. It is Open - the white paper for Codeplay mentions it being protected by patents. (boo hiss)
4. Did I mention that it is supported in gcc 4.2 which I built it on my Powerbook last week and it is very cool?

So maybe Codeplay is a nice system. Maybe they even have users and can offer support. But if you are looking to make your C++ code run multi-threaded with the least amount of effort I've seen ( It is still effort! ) take a look at OpenMP. In my simple tests it was pretty easy to make use of OpenMP, and I am looking forward to trying it on a rather more complicated application.

Re:Prefer OpenMP (4, Informative)

PhrostyMcByte (589271) | more than 7 years ago | (#18297282)

Don't forget the other end of the development spectrum - Visual C++ 2005 has builtin OpenMP support too.

Re:Prefer OpenMP (1)

drerwk (695572) | more than 7 years ago | (#18297344)

The Powerbook reference should have identified me as an acolyte of Steve! Add to that I am stuck in my day job using VS 2003 for some management reason I forget at the moment. But that aside, do you use OpenMP in VS2005, or know of people who are doing so? I just went over the full 2.5 spec so I can map out some strategy for trying OpenMP with our existing software. It is almost 1M lines, and was not designed with Multi-threading in mind. I do think there will be places I can use OpenMP, and the ability to specify exactly where and how gives me some optimism.

Re:Prefer OpenMP (1)

PhrostyMcByte (589271) | more than 7 years ago | (#18297368)

I use it sometimes, for simple things. Most of the time I do my own threading though - VC2005 requires you to distribute a vcomp.dll with your app which is a bit of a turnoff.

Re:Prefer OpenMP (4, Interesting)

jd (1658) | more than 7 years ago | (#18297656)

Personally, I would agree with you. I have to say I am not fond of OpenMP - I grew up on Occam, and these days Occam-Pi blows anything done in C out of the water. (You can write threads which can auto-migrate over a cluster, for example. Even OpenMOSIX won't work at a finer granularity than entire processes, and most compile-time parallelism is wholly static after the initial execution.)

On the other hand, OpenMP is a far more solid, robust, established, reputable, reliable solution than Codeplay. The patent in Codeplay is also bothersome - there aren't many ways to produce an auto-parallelizing compiler and they've mostly been done. This means the patent either violates prior art (most likely), or is such "black magic" that no other compiler-writing company could expect to reproduce the results and would be buying the technology anyway. It also means they can't ship to Europe, because Europe doesn't allow software patents and has a reputation of reverse-engineering such code (think "ARC-4") or just pirating/using it anyway (think: pretty good privacy version 2, international version, which had patented code in it)

Re:Prefer OpenMP (0)

Anonymous Coward | more than 7 years ago | (#18297832)

Hmm. Looks like CodePlay are a European company ( based in Edinburgh, Scotland), so not a problem there then.

mod parent up (1)

blackcoot (124938) | more than 7 years ago | (#18297844)

i was just going to ask "who cares, openmp does this already" now i know that i don't care. it's not nearly as interesting as the work done out of nasa greenbelt on a project called ace (which actually is a genuinely automatic parallel compiler that targets clusters rather than cpus --- really kickass concept). my very limited experience with openmp is that i prefer the mpi approach. that said, i don't think mpi or openmp are really the right answer -- it takes a language that was designed from the ground up to do parallel execution "right". in this case, i think things like HP fortran actually hurt rather than help because they're very familiar which ends up being a bad thing because they're most like something that doesn't solve the problem.

OpenMP can support clusters (4, Informative)

mi (197448) | more than 7 years ago | (#18298624)

Intel's compiler (icc), available for Linux [intel.com] , Windows [intel.com] , and FreeBSD [freshports.org] extends OpenMP to clusters [intel.com] .

You can build your OpenMP code and it will run on clusters automatically. Intel's additional pragmas allow you to control, which things you want parallelized over multiple machines vs. multiple CPUs (the former being fairly expensive to setup and keep in sync).

I've also seen messages on gcc's mailing list, that talk about extending gcc's OpenMP implementation (moved from GOMP [gnu.org] to mainstream in gcc-4.2 [gnu.org] ) to clusters the same way.

Nothing in OpenMP [openmp.org] prevents a particular implementation from offering multi-machine parallelization. Intel's is just the first compiler to get there...

The beauty of it all is that OpenMP is just compiler pragmas [wikipedia.org] — you can always build the same code with them off (or with a non-supporting compiler), and it will still run serially.

Re:OpenMP can support clusters (1)

init100 (915886) | more than 7 years ago | (#18299168)

You can build your OpenMP code and it will run on clusters automatically.

Won't that require some runtime support, like mpirun in MPI (that takes care of rsh/ssh-ing to each node and starting the processes)?

Re:OpenMP can support clusters (2, Insightful)

mi (197448) | more than 7 years ago | (#18299258)

Won't that require some runtime support, like mpirun in MPI (that takes care of rsh/ssh-ing to each node and starting the processes)?

Well, yes, of course. You also need the actual hardware too :-)

This is beyond the scope of the discussion, really — all clusters require a fair amount of work to setup and maintain. But we are talking about coding for them here...

Re:Prefer OpenMP (1)

thanasakis (225405) | more than 7 years ago | (#18299322)

This is good stuff. Did you use any special flags to compile it? How about posting a nice walkthrough somewhere? Unfortunately xcode ships with 4.0.1 and fink with 4.1.something.

Regards,
Athanasios

Yup (3, Interesting)

PhrostyMcByte (589271) | more than 7 years ago | (#18297260)

For the majority of apps, OpenMP [wikipedia.org] is enough. That is what this looks like - a proprietary OpenMP. It might make it easier than creating and managing your own threads but calling it "auto" parallelizing when you need to mark what to execute in parallel is a bit of a stretch.

For apps that need more, it is probably a big enough requirement that someone knowledgable is already on the coding team. Which isn't to say that a compiler/lang/lib lowering the "experience required" bar wouldn't be welcomed, just that I wish these people would work on solving some new problems instead of re-tackling old ones.

The main purpose of these extensions seems to be finding a way to restrict the noob developer enough that they won't be able to abuse threading like some apps love to do. That is a very good thing in my book! (Think Freenet [freenetproject.org] , where 200-600 threads is normal.)

don't worry guys, i got you.... (1)

teknopurge (199509) | more than 7 years ago | (#18297268)

int id = getCurrentProcessorOrCellUniqueID();
int modulo = 3; // change this to your liking
 
// elite auto-parallelization logic. shouts to the boyz in tha h00d! Hi mom!
if((id % modulo) == 0){
// do proc or cell specific code based on mod
}else if((id % modulo)) == 1){
// you get the idea....
}
Just doing my part to save you some $$$$. (dolla, bills yall!)

How long has Sun Studio had "-xautopar"? (2, Informative)

Anonymous Coward | more than 7 years ago | (#18297312)

Yep, it's in there.

And it works, too.

Ok (1)

Psychotria (953670) | more than 7 years ago | (#18297316)

So, anything within the loop (using your example) cannot depend on i-1 being known? So, for the loop:

for (i = 0; i doSomething (i);

doSomething() cannot know or infer i-1. Is that right? So doSomething() really has to regard i as (almost) random. So the loop becomes:

for (i = 0; i doSomething (uniqueRand (i/RANDMAX*100);

No wonder it's so complicated and hard to debug ;-)

2 ways to faster code. (0)

AHuxley (892839) | more than 7 years ago | (#18297336)

In Capitalist West no hassle for you to submit slashvertisement about faster proprietary compiler.
In Soviet Russia no hassle to get compiler source code as slashvertisement links to you.

Re:2 ways to faster code. (0)

Anonymous Coward | more than 7 years ago | (#18297646)

MOD PARENT UP.

As I have recently been working in parallel computing, it was very disgusting to find such a hot topic be a slashvertisement

BOO

Been done... (3, Interesting)

TheRealMindChild (743925) | more than 7 years ago | (#18297342)

I have my 'Mips Pro Auto Parallellizing Option 7.2.1' cd sitting right next to my Irix 6.5 machine... and I know it's YEARS old

Re:Been done... (5, Interesting)

adrianmonk (890071) | more than 7 years ago | (#18298016)

I have my 'Mips Pro Auto Parallellizing Option 7.2.1' cd sitting right next to my Irix 6.5 machine... and I know it's YEARS old

Oh, are we having a contest for who can name the earliest auto-parallelizing C compiler? If so, I nominate the vc compiler on the Convex [wikipedia.org] computers. The Convex C-1 was released in 1985 and I believe had a vectorizing compiler from the start, which would make sense since it had a single, big-ass vector processor (one instruction, crap loads of operands -- can't remember how many, but it was something like 64 separate values being added to another 64 separate values in one single instruction).

I personally remember watching somebody compile something with it. It was really neat to watch -- required no special pragmas or anything, just plain old regular C code, and it would produce an annotated copy of your file telling you which lines were fully vectorized, partly vectorized, etc. You could, of course, tweak the code to make it easier for the compiler to vectorize it, but even when you did, it was still plain old C code.

Not a big deal (1)

the100rabh (947158) | more than 7 years ago | (#18297376)

For what I have seen is that this system just parallelizes only the part of the code in sieve instead of the whole code. How is this better than others. Please can someone enlighten me on that.

Re:Not a big deal (0)

Anonymous Coward | more than 7 years ago | (#18297480)


If it's like other compilers (Sun, IBM) that do this and which I've worked with, it likely looks for things like loops, larger control structures, other shit.

Think of automatic parallization like C : it gets close to machine level code but if you want the real deal you write important bits in assembler.

Re:Not a big deal (1)

ioshhdflwuegfh (1067182) | more than 7 years ago | (#18298754)

For what I have seen is that this system just parallelizes only the part of the code in sieve instead of the whole code. How is this better than others. Please can someone enlighten me on that.
If this part of the code happens to be the one where most of your CPU time is spent,...

Could this possibly be for real? (1)

gozu (541069) | more than 7 years ago | (#18297486)

I'm no parallelization expert but it seems to me that a compiler that reliably gives you a scaling factor above 80% would be a huge deal. Is it really possible to achieve those kind of results across the board? Or is this a bunch of bull.

Re:Could this possibly be for real? (1)

init100 (915886) | more than 7 years ago | (#18299218)

Nothing new here, move along.

Jokes aside, this is bull. It requires the coder to mark sections that he wants to run in parallel, making the "automatic" part a bit of a stretch. And then, there already is a system that does this, and has done it for 10 years. It's called OpenMP [wikipedia.org] , and features wide industry support in compilers such as the upcoming gcc 4.2, MS Visual Studio .NET 2005, the Intel compiler suite and compilers from traditional SMP vendors such as IBM and SGI.

SmartVariables is a good alternative to MPI / PVM (2, Insightful)

Anonymous Coward | more than 7 years ago | (#18297490)

Let's see if I can teach any old dogs some new trix.

Here is a quote from the SmartVariables white-paper:

"The GPL open-source SmartVariables technology works well as a replacement for both MPI and PVM based systems, simplifying such applications. Systems built with SmartVariables don't need to worry about explicit message passing. New tasks can be invoked by using Web-Service modules. Programs always work directly with named-data, in parallel. Tasks are easily sub-divided and farmed out to additional web-services, as needed - without worry of breaking the natural parallelism. If two or more tasks ever access data of the same name and location, then that data is automatically shared between them - without need for additional parallel programming constructs. Instead of using configuration files with lists of available machines, a shared SmartVariables List object (with a commonly accepted name, like "machines@localhost") could easily hold the available host names, which can then be used for dynamic task allocation. The end-result is that SmartVariables-based parallel systems need only reference and work with distributed data, and don't need to manage it. Automatic sharing means there is no need to worry about explicit connection, infrastructure, or message-passing code. Instead, applications only need agree on the names used for their data. Names and object locations are easily managed by using a SmartVariables based Directory-Service as an additional layer of object indirection."

The rest of this paper is here: http://www.smartvariables.com/doc/DistributedProgr amming.pdf [smartvariables.com]

A single code-base works on Apple / Linux / Windows.
Complete code and docs at http://smartvariables.com/ [smartvariables.com]

Re:SmartVariables is a good alternative to MPI / P (0)

Anonymous Coward | more than 7 years ago | (#18297972)

Perhaps some programmers would like their code to run in parellel without a GPL viral license infection?

Re:SmartVariables is a good alternative to MPI / P (1)

init100 (915886) | more than 7 years ago | (#18299250)

It may or may not be easier to program, but will it perform well enough? Does it support high-speed low-latency interconnects like Myrinet or Infiniband? Will it perform well enough to make up for the high price of such interconnects? Gigabit Ethernet performance is not enough on such systems, as latency is a major factor, and the latency of Ethernet is typically high compared to HPC interconnects.

Sounds like multi-threading AND NOT Parallelizing (3, Insightful)

mrnick (108356) | more than 7 years ago | (#18297496)

I read the article, the information at the company's web site and even white papers written on the compiler. And although I did see one reference to "Multiple computers across a network (e.g. a "grid")" there was no other mention of it.

When I think of Parallelizing software, after getting over my humors mind thinking of a virus that paralyzes users, what comes to mind is clustering. When I think of clustering the train of thought directs me to Beowulf and MPI or it's predecessor PVM. Though I can find no information that supports the concept of clustering in any manner.

Again I did see a reference to: "Multiple computers across a network (e.g. a "grid")" but according to Wikipedia grid computing is defined "A grid uses the resources of many separate computers connected by a network (usually the Internet) to solve large-scale computation problems. Most use idle time on many thousands of computers throughout the world."

Well, that sounds like the distributed SETI project and the like, which would seem even more ambitious than a compiler that would help write MPI code for Beowulf clusters.

From all the examples this looks like a god compiler for writing code that will run more efficiently on multi-core and multi-processor systems but would not help you in writing parallel code for clustering.

Though, this brings up a concept that many people forget. Even people that I would consider to be rather intelligent on the subject of clustering often forget this. And that is that if you have an 8 computer cluster with each node running on a system with dual-core Intel CPU installed that if you write parallel code for it using MPI you are benefiting from 8 cores in parallel. Many people that write parallel code forget about multi-threading. To benefit from all 16 cores in a cluster I just described the code would have to be written multi-threaded and parallel. One of the main professors involved in a clustering project at my university stated to me that in their test environment they were using 8 dell systems with dual-core Intel CPU so in total they had the power of 16 cores. Since he has his Ph. D. and all I didn't feel the need to correct him and explain that unless his code was both parallel and multi-threaded he was only getting the benefit of 8 cores. I knew he was not multi-threading because they were not even writing the code in MPI rather they were using Python and batching processes to the cluster. From my knowledge Python cannot write multi-threaded applications. Even if it can I know they were not (from looking at their code).

Sometimes it's the simplest things that confuse the brightest of us....

Nick Powers

are you sure your prof is wrong? (1)

Chirs (87576) | more than 7 years ago | (#18297562)


Assuming their cluster management system knows that each node is dual core, can you explain why they couldn't run two processes on each node?

single process uses 1 core unless multi-threaded (0)

mrnick (108356) | more than 7 years ago | (#18298084)

The operating system on a multiple-core machine can split up the processes but one process can only run on one core unless it has been written in a multi-threaded fashion.

In parallel processing general each machine is running one part of a program, thus one program, and unless that program is multi-threaded as well as parallel then it can only use one core per node on a cluster.

Though, someone who writes multi-threaded parallel applications should be held in high esteem! I don't know any such coders.

Nick Powers

Re:single process uses 1 core unless multi-threade (2, Insightful)

julesh (229690) | more than 7 years ago | (#18298456)

The operating system on a multiple-core machine can split up the processes but one process can only run on one core unless it has been written in a multi-threaded fashion.

In parallel processing general each machine is running one part of a program, thus one program, and unless that program is multi-threaded as well as parallel then it can only use one core per node on a cluster.

Though, someone who writes multi-threaded parallel applications should be held in high esteem! I don't know any such coders.


Have you considered that if you run two copies of the process on each node, it will use both cores?

Re:single process uses 1 core unless multi-threade (1)

Mad Merlin (837387) | more than 7 years ago | (#18298482)

I think the parent's point was that the cluster management system could simply batch out two individual (single threaded) processes to each node (for a total of 16 processes over 8 nodes), rather than just a single process per node. This may or may not work, depending on how the processes communicate over the network, but it is certainly a substantially easier task than writing a multi-threaded distributed program in most any case.

Re:single process uses 1 core unless multi-threade (1)

Celandine (610250) | more than 7 years ago | (#18298900)

And the parent was correct -- the original poster is talking nonsense. I'm running MPI code on a cluster consisting of some single-core machines, some 2-CPU boxes, and some 2-CPU 2-core systems. All the cores are in use.

batching code to a cluster is just silly (1)

mrnick (108356) | more than 7 years ago | (#18299206)

Well then you would have non multi-threaded code and non parallel code as well. There are only 2 ways to have jobs run on a cluster.

1) Write the code using MPI or it's predecessor PVM.
2) Have non parallel code that has separate programs that each handle a part of the data and batch it to nodes on the cluster.

Method 2 is either done as an initial step to help determine how to split up your processing so you could use that information to write a MPI or PVM version or by people that don't know how to write parallel code (MPI or PVM) but still want to be able to use some of the power their cluster provides.

If you chose method 2 and never intended to modify your code to be parallel then I wouldn't go bragging about your great cluster to anyone that knows much about clusters because once they figured out what you were doing they would be shocked at your ignorance and consider it very humorous that you went to so much trouble to build a cluster but never learned how to use it properly.

So, yes you could use method 2 and batch each node 2 batches and in most cases the operating system would run the second batch it received on a unique core compared to it's first batch though you could not guarantee that since the operating system might think it's better to have both on the same CPU depending on what other processes were running on the system at that given time.

But, what if you had nodes in your cluster that have varying numbers of cores / CPUs on each system? This is quite common. So, you could have a cluster that had 20 single core machines 40 dual-core machines and 10 8-core machines. So, if you could use all of the core on all the machines you would have the potential of 180 cores worth of computing power.

To take advantage of this using batching you would have to have 180 separate programs, each that handled the calculation of 1 / 180 th of the complete solution and have written your batch script so that it knows how to allocate those batches based upon the number of cores a machines has. And again, there would be no guarantee that there would be no instances where a node would have a single core running more than 1 batch, except for the single core machines. This uncertainty would increase as the number of cores on a node increased. for example if you sent 8 batches to a node with 8 cores it would be very unlikely that each batch would be running on a unique core, meaning you could assume that at least 1 core was running multiple batches and at least 1 core was running no batches.

What is even more common is that you have a cluster of varying numbers of nodes with unknown numbers of cores / CPU in each node. In this scenario it would be impossible to batch your code in a manner that would take advantage of all the cores available, not to mention that it would also be impossible to make sure that every node was even participating. In this scenario the only way you could guarantee that your code took advantage of all the nodes and all the cores / CPU of each node would be to write your code in parallel mode (via MPI or PVM) and that the code was also written in a multi-threaded fashion.

It may be more difficult to write parallel code than batching non parallel code, for someone who doesn't know how to write MPI or PVM code. And if any of your nodes contained multiple core / CPU it would be even more difficult to write a parallel version of the code that could take advantage of the additional cores on the nodes that had them. But, if you really wanted to take maximum advantage of all the available resources in your cluster then you would have to write parallel code that is multi-threaded.

It's not as difficult as it sounds since the segments of code that could be parallelized would also be the same segments that would lend themselves to be multi-threaded. Both require segments of code where all iterations do not have dependences on previous or subsequent iterations of the same code segment. Some code can be written in parallel and some must be written sequentially the code that could be written in parallel could also be multi-threaded and the code that had to be written sequentially could also not be multi-threaded.

I could write such code but I don't work in an environment in which we do not have clusters available nor do we have significant amounts of algorithms that could take advantage of either parallel coding (if a cluster were available) or multi-threading. There are many many programs written that simply fork multiple processes of itself to take advantage of multiple processors if available.

A good example of this would be Apache 1.3. If you look at your process list you will notice several httpd daemons running.

This happens often because identifying algorithms or in many cases alternative algorithms that can take advantage of multi-threading is not a simple task.

But, if you are writing your code for a cluster then you would have to identify as many algorithms as possible that could be written in parallel. In doing this you will have identified all the algorithms that could also benefit from multi-threading. The way each is coded is different but identifying the eligible algorithms is the most difficult part. Taking code that someone had written in parallel and then modifying it so that it would also take advantage of multi-threading would be a very simple task. I am surprised that there are not compilers available that don't do this for you automatically.

If the author of the story this thread is based upon is reading this then what I have described above would be a very useful tool for coders writing applications for clusters. And it shouldn't be very difficult to adapt a compiler to automatically create multi-threaded code since it could use the extra code used to identify parallel portions of code to know which code to create multi-threaded version of as well. Plus you wouldn't have to worry about dependences because the programmer would have already taken care of this for the parallel version as well. So, there wouldn't be any need for silly sieve tags in your source.

Also, if you are reading this then please explain to me the purpose behind the run time library that executable code that your compiler creates depends upon. There are many examples of multi-threaded applications that don't require any such run time library. The only reason I can see for the presence of such a library would be to line the pockets of the company that wrote the compiler with royalty fees for distributing the worthless library. If I'm wrong then please explain.

Nick Powers

Re:single process uses 1 core unless multi-threade (1)

init100 (915886) | more than 7 years ago | (#18299294)

This may or may not work, depending on how the processes communicate over the network, but it is certainly a substantially easier task than writing a multi-threaded distributed program in most any case.

Actually, if the message passing implementation is done right, communication between processes on the same node is done though shared memory, and not network communication. Actually mixing threading and message passing in the application code just means unnecessary complexity in this case.

Re:Sounds like multi-threading AND NOT Parallelizi (0)

Anonymous Coward | more than 7 years ago | (#18297708)

sorry to do the anonymous coward thing, but i'm too lazy to make an account. If he has several processes running at once on each node, the OS schedules them on different cpu's, using the dual core. That's what an OS does.

Re:Sounds like multi-threading AND NOT Parallelizi (1)

init100 (915886) | more than 7 years ago | (#18299350)

Sounds like multi-threading AND NOT Parallelizing

Both multi-threading and message passing systems are parallel systems, they are just different subsets of the parallel computing paradigm. You cannot really claim with any authority that multithreading isn't parallel computing, and that only message passing is.

Multithreading is used on a shared memory multiprocessor, and message passing is used on distributed memory multiprocessors. They are just two different ways of implementing parallel code, and none of them is more parallel than the other.

Well, that sounds like the distributed SETI project and the like, which would seem even more ambitious than a compiler that would help write MPI code for Beowulf clusters.

Actually, the parallelization is much more complex in the cluster case than in the distributed computing (e.g. SETI@Home) case. Distributed systems are often processing data packages that are inherently independent of one another, and require no communication between the compute nodes at all. In this case, parallelization just amounts to splitting the work into pieces and handing out to the worker nodes, as well as collecting and aggregating the results.

Clusters, on the other hand, are primarily used for tasks that need (often intense) cooperation between the compute nodes to solve, such as solving large systems of linear equations. Such parallelization is much harder, and I won't hold my breath waiting for such a compiler to appear.

neither useless nor compelling (1)

kokorozashi (652675) | more than 7 years ago | (#18297558)

The trick to taking advantage of future processors like the ones architecture futurists such as David Patterson envisions when he talks about "manycore" chips is to make parallel programming easy. Making the programmer puzzle out the parallelism for himself isn't the way to do that. We already know pre-emptive threading is too difficult for most; putting pervasively parallel programming (PPP) in human hands would be even worse. A proper approach to PPP involves inventing a new language, not adding warts to C++, which already has more than enough of its own, thank you very much.

Where's the torrent? (0)

timecop (16217) | more than 7 years ago | (#18297634)

I'd like to evaluate this new technology.

C++... great... (0)

Anonymous Coward | more than 7 years ago | (#18297640)

Just what we need, another C++ crutch.

Can't we just let that wretched language die already?

Re:C++... great... (1)

Haeleth (414428) | more than 7 years ago | (#18298756)

Can't we just let that wretched language die already?
Not until you find a better alternative that provides the same performance.

Yes, there are still plenty of performance-critical applications. I sure wouldn't want to run an OS written in Python, or a movie compressor written in Ruby. Heck, many such things are still written in C; you should be damn grateful even C++ gets a look-in...

WCF - Dotnet 3.0 (1)

MickDownUnder (627418) | more than 7 years ago | (#18297696)

This is a feature of WCF - Windows Communication Foundation in .NET 3.0 (part of Win V). WCF is designed for next gen CPUs with large numbers of cores. It spawns worker threads for you as needed and sychronises these calls for you automatically. You have the option of manually creating and sychronising threads, but out of the box it does it all for you behind the scenes. Just imagine coding for a machine with 1024 cores! It's obvious that writing software as we've done in the past where you manually spawn threads and sychronise them is never going to effectively use such hardware. You are obviously going to have a framework like WCF (or this compiler) that takes advantage of this for you. Maybe the wow has started now after all hmm? ;) I love being flame bait ... especially when I'm right.

Re:WCF - Dotnet 3.0 (1)

ioshhdflwuegfh (1067182) | more than 7 years ago | (#18298766)

This is a feature of WCF - Windows Communication Foundation in .NET 3.0 (part of Win V). WCF is designed for next gen CPUs with large numbers of cores.
Which is to say you can't use it now, but in the future.

Just imagine coding for a machine with 1024 cores!
wow!

Maybe the wow has started now after all hmm?
Or maybe wow has alredy started in the future? In your fertile imagination?

Re:WCF - Dotnet 3.0 (1)

init100 (915886) | more than 7 years ago | (#18299362)

This is a feature of WCF - Windows Communication Foundation in .NET 3.0 (part of Win V). WCF is designed for next gen CPUs with large numbers of cores. It spawns worker threads for you as needed and sychronises these calls for you automatically. You have the option of manually creating and sychronising threads, but out of the box it does it all for you behind the scenes.

So WCF takes care of parallelizing your compute-intensive tasks for you? Sorry, but I don't believe you. It might spawn threads for communication-related tasks, but those aren't really compute-intensive anyway.

Wow - Deterministic Concurrency (1, Interesting)

Anonymous Coward | more than 7 years ago | (#18297910)

Deterministic concurrency is a great aid for debugging - no more race conditions, no more heisenbugs, no more visibly different program behaviour on 1 core, 2-core, hyper-threading, Quad Core, 8 Core, and whatever the Intel and AMD road maps bring out in the future. Looks good for the sanity of all those programmers who have ever had problems manifest only on one machine after testing!

This Sieve programming seems also to make it easier to target the PS3, which has gotten a bad rap as being notoriously difficult to program well. Who wants to break programs into tiny chunks that DMA work in and results out, instead of letting some automated system translate a higher level program into that low level programming model? Its about time that getting decent returns on parallelisation was easy. Its also time for the low level OS threading APIs (Posix, Win32) to be forgotten and buried. No more locking, data races, dead locks, and general programming complexity in order to get any speed up out of multi-core systems.

I also like the idea of buying a Physics processor unit (PPU) and having an automatic speed boost in my programs.

RapidMind (1)

khaledh (718303) | more than 7 years ago | (#18298068)

This looks similar to RapidMind [rapidmind.net] , which is a software development platform that, among other things, "Enables applications to run in a data-parallel way." (I'm not affiliated with them.)

Re:RapidMind (1)

Nappa48 (1041188) | more than 7 years ago | (#18299088)

Finally someone mentions Rapidmind!
I guess that its just not as popular right now, shame really because its pretty easy to get into. Still learning though since i AM new to both C and Rapidmind! College gets in the way alot...meh

Is this better than OpenMP? (2, Informative)

Anonymous Coward | more than 7 years ago | (#18298108)

So, I fail to see what's new about this. As has been mentioned before, OpenMP auto-parallelizes for SMP systems quite well, as long as you know what you're doing. Like anything done in parallel, if you don't figure out where your data and algorithm dependencies are you'll hose your program. If Sieve does some sort of dependency analysis, that would be interesting, but I doubt it would catch all problems. In fact, I imagine it's provably impossible to auto-parallelize in the general case -- it will likely be proven equivalent to the halting problem eventually.

What would be new is when someone substantially improves on MPI. Auto-parallelizing a FOR loop is amusing, doing the same for a complex algorithm moving data around in a cluster, well, that's a different sort of difficult.

Anyway, no matter how many libraries and tools come out to ease the pain, parallel programming is frigging hard. In fact, the more automagic the compiler, the harder it will be to debug when the inevitable race condition sneaks through. Combine this with lowering the bar for parallel programming and letting more idiots in and we can look forward to some truly horrific code. If you make it so any idiot can code, any idiot will!

Why C++ (2, Insightful)

impeachgod (982062) | more than 7 years ago | (#18298254)

Why use C++? Aren't there languages that support parallelizing better, like the functional ones? Or perhaps develop your own language tuned to parallelizing.

Re:Why C++ (1, Insightful)

Anonymous Coward | more than 7 years ago | (#18298856)

Why use C++? Aren't there languages that support parallelizing better, like the functional ones? Or perhaps develop your own language tuned to parallelizing.

Because C++ has power, because C++ is alive, because C++ has speed, and... more importantly...

Because there are a lot of today's applications built on C++, working applications that work today, and won't need a new, unstable version for years because it was re-written from scratch in a new language.

Every three or five years, we've got a new very cool framework that will be obsolete three or five years after. No one will refactor from scratch an existing application once every three years, just to have this new shining concept, if a viable and simple enough alternative is offered in the existing application language.

What C++ applications need is not a new language, because today C++ application will never adopt it (who will pay for the months/years developpement costs just to switch language?). What C++ applications need is evolution of the language.

Think about how C++ evolved from C, and remains compatible, which eased a lot slow and constant evolution of a living application from C to C++. The next C++ (C++++, or, perhaps ++C?) will need to be compatible enough with C++ and C to enable C++ and C code mixing within one's library or executable.

Not that we need a ++C. C++ is alive enough today. Ok, looking at boost or templates, some would call is cancer metastasis. But is this truly warts, or is this the fear of new concepts introduced in an otherwise familiar language? Perhaps C++ don't need a new language. Perhaps what C++ needs is C++ language evolution and C++ developpers' cognitive evolution.

Auto-parallelizing? (1, Insightful)

Anonymous Coward | more than 7 years ago | (#18298814)

How can be automatic a compiler that needs you to mark the parallel sections? It just simplify the use of threads, but you still have to find parallelism and write your code parallel. It is like OpenMP...

Minimum hassle? (1)

thewiz (24994) | more than 7 years ago | (#18299164)

'What Sieve is is a C++ compiler that will take a section of code and parallelize it for you with a minimum hassle."

What does the compiler do, taunt you with harsh language while it compiles your code?
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>