Beta

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

What Makes Parallel Programming Difficult?

Unknown Lamer posted more than 3 years ago | from the double-the-threads-double-the-fun dept.

Programming 196

An anonymous reader writes "Intel's Aater Suleman writes about why parallel programming is difficult. ... I was unaware ... that a major challenge in multi-threaded programming lies in optimizing parallel programs not just getting them to run. His analysis is insightful and the case study is very enlightening if you are unfamiliar with parallel code debugging. "

cancel ×

196 comments

Sorry! There are no comments related to the filter you selected.

Easy! (4, Funny)

Anonymous Coward | more than 3 years ago | (#36266270)

Motherboards rarely have parallel ports these days!

Re:Easy! (5, Funny)

sgbett (739519) | more than 3 years ago | (#36266352)

is that things sometimes happen in the wrong order.

Re:Easy! (5, Funny)

sgbett (739519) | more than 3 years ago | (#36266376)

The problem with parallel programming

Re:Easy! (-1, Troll)

sgbett (739519) | more than 3 years ago | (#36266398)

C-C-C-OMBO BREAKER!

Re:Easy! (-1)

Anonymous Coward | more than 3 years ago | (#36266552)

samefag

Re:Easy! (0)

sydneyfong (410107) | more than 3 years ago | (#36266558)

Q)Why did the multithreaded chicken cross the road?
A)to To other side. get the
Q)Why did the multithreaded chicken cross the road?
A)other to side. To the get

Re:Easy! (1)

blair1q (305137) | more than 3 years ago | (#36267958)

That's a different problem, related to the fact that people buy "parallel" but forget to read the part where it says "synchronize". It's kind of like not validating input, or ignoring return values. Not so much a pitfall as a failure to have basic skills.

The problem these folks are having is that they bought parallel but decided it just wasn't optimizing their stuff enough. It'd be really funny if they found they could optimize-away the parallelization, like, by precomputing certain results and using a lookup, or something. That's how the oldbies did it [datavis.ca] .

Re:Easy! (1)

mybecq (131456) | more than 3 years ago | (#36266738)

Difficult Parallel Makes Programming What?

(Prior message was optimized for concurrent throughput).

Re:Easy! (0)

Anonymous Coward | more than 3 years ago | (#36266758)

is that things sometimes happen in the wrong order.

Not really. The problem is knowing what is and what isn't worth spreading over multiple cores.

Re:Easy! (2)

BadPirate (1572721) | more than 3 years ago | (#36266926)

Yes it is happen when that terrible's.

Re:Easy! (2)

Acius (828840) | more than 3 years ago | (#36267128)

- The nazi grammar parallel

No combination above gramatically words possible of the is correct. You 's drop the should.

Re:Easy! (1)

oztiks (921504) | more than 3 years ago | (#36268022)

Because its just as difficult as parallel parking?

I think they made a movie about this... (1)

thelenm (213782) | more than 3 years ago | (#36266338)

"You're just not thinking fourth-dimensionally!"

"Right, right, I have a real problem with that."

Re:I think they made a movie about this... (1)

dch24 (904899) | more than 3 years ago | (#36266476)

McFly! But really, the first and second movies were way better.

If you've read the article, you'll feel a little smarter today. Even if it was just a good review.

If you haven't heard of Amdahl's Law [wikipedia.org] , pay attention. (Since simple.wikipedia.org doesn't have a nice, short explanation, how about a car analogy?)

Amdahl's Law
You're street racing. But there's a school zone in the middle of the course, and you always slow down for the school.

No matter how fast you go on the rest of the course, that school zone in the middle will just end up taking a larger and larger percent of your lap time, until it alone is what is keeping up your lap time.

Applying the analogy to parallel computing: after a certain number of cores, more cores doesn't make the problem go faster. Something else (probably many things) will eat up so much of the total time that it won't improve things to double, quadruple, or add a hundred more cores.

Re:I think they made a movie about this... (0)

Anonymous Coward | more than 3 years ago | (#36266650)

Shouldn't we be looking at how manufacturing industries have solved those problems? Aren't they doing "parallel manufacturing" all the time?

Let's pick a "random" item like, say, the iPhone. It doesn't matter if one part is built-in inside another, it's just yet another parallel process somewhere.
- glass back
- glass front
- LCD panel
- capacitive touch overlay
- CPU
- RAM
- wireless ICs
- various resistors and capacitors
- connectors
- buttons
- PCB
- all the software (iOS, drivers, APIs, Mail, Safari, iTunes, Maps, etc)

If a single part is missing, the whole process gets delayed but some parts are not dependent on others. It's not important if the LCD isn't ready at the PCB assembly stage, but it is important at the final assembly stage.

Looking at the problem like a race with a single car is not going to help you understand it.

Re:I think they made a movie about this... (1)

adonoman (624929) | more than 3 years ago | (#36266814)

Amdahl's law is saying that no matter how much paralization you get, you are still going to be limited by the longest non-paralizable path. This is no different than your real life example - your longest non-paralizable path (say CPU->PCB assembly->ROM flashing->final assembly->packaging) is going to limit how fast you can manufacture your iPhones. It doesn't matter how fast you can manufacture glass backs, or RAM, if it's not on that critical path, you won't be speeding up your result at all.

Re:I think they made a movie about this... (1)

dch24 (904899) | more than 3 years ago | (#36267176)

Good point, but with my post (GGP) I really was hoping to spark this kind of discussion. Maybe manufacturers will find solutions to programming problems, you never know...

Re:I think they made a movie about this... (1)

readin (838620) | more than 3 years ago | (#36267286)

A couple of problems with the analogy:
1. In manufacturing, the idea is to do the exact same thing a jillion times with the exact same result. Interchangeable parts make different rates of a production easier to deal with. In computing this isn't the case. The assembly line make be identical each of the jillion times, but the data going through it is not.
2. Manufacturing plants are expensive to build. It makes sense for an engineer to spend weeks or even months optimizing the process. We don't have that luxury for most of the code we write.

Re:I think they made a movie about this... (1)

ildon (413912) | more than 3 years ago | (#36266778)

Your understanding of Amdahl's Law isn't quite correct. I guess if you want to use the street race analogy, it's more like you each have your own lane for part of the race, but for another part of the race you have to go single file into a single lane, and if you're parallel to someone as you approach the choke point you have to arbitrarily let them ahead of you or pass them based on some set of rules that may not necessarily be fair. And the goal isn't to win the race but you're all delivering something and the goal is to move the maximum amount of the goods as fast as possible.

You know what, a street racing analogy doesn't work at all for Amdahl's Law.

Re:I think they made a movie about this... (1)

dch24 (904899) | more than 3 years ago | (#36267202)

Amdahl's law is just about speedups. It doesn't imply parallel operations at all. I know the wikipedia page says it's about parallel processes. It's not.

Re:I think they made a movie about this... (1)

ildon (413912) | more than 3 years ago | (#36267792)

I didn't look at the wikipedia page. This is from memory from the parallel programming class I took. Either way the main point is that your analogy was poor.

Re:I think they made a movie about this... (1)

Opportunist (166417) | more than 3 years ago | (#36267144)

Not quite, Amdahl's Law says that no matter how well you optimize your parallel tasks, the one that you cannot parallelize will gobble up your time. I don't think there's a good car analogy for that.

Ask your project manager. He's dealing with that problem on a daily base.

Re:I think they made a movie about this... (1)

dch24 (904899) | more than 3 years ago | (#36267230)

I am my project manager, you insensitive clod. :-)

People are like cars. The faster you drive them, the more hot air you get out their rear end.

Re:I think they made a movie about this... (0)

Anonymous Coward | more than 3 years ago | (#36267504)

> I don't think there's a good car analogy for that.
Sure there is, compare amount of gas && # of cylandars in the car to the amount of force generated.
At some point just adding more cylandars isn't going to give you more power relative to the amount of gas consumed (2cc vs 16cc engine).

Re:I think they made a movie about this... (1)

Opportunist (166417) | more than 3 years ago | (#36267598)

While true this isn't due to the problem of not being able to parallelize more tasks. If you want to use the "add more doesn't mean more gets created" metric, use the myth of adding more people to a project to speed it up. Even aside of training and other overhead, you cannot simply parallelize everything in software creation. Let's do a simple example.

There's two tasks that can be ran parallel. An administrative task (like meetings and organizing crap) and the programming task of creating the actual program. They're dependent on each other, i.e. nothing's complete until both are done, but they can be ran parallel to each other. The admin task takes 2 time units, the programming task takes 10. Now, all overhead aside, let's assume that the programming task is actually 10 tasks each taking one time unit. In other words, you could use 5 programmers, each of them taking 2 time units to complete two of the 10 tasks, which would parallelize nicely with the 2 time unit admin task. Adding more programmers and parallelizing more programming tasks does not allow you to complete the overall job faster since you would have to wait for the admin task to complete.

Good PMs know where to put their resources to get the best results and parallelize what they can. Bad ones will try to parallelize everything and waste a lot of resources, leading to dead hours while waiting for the ones that cannot be parallelized to complete.

It's amazing how many bad PMs exist...

Re:I think they made a movie about this... (1)

jd (1658) | more than 3 years ago | (#36266944)

Well of course you'd have a problem if using reals. For four dimensions, you need to typecast to quaternions.

use the right tool (0)

Anonymous Coward | more than 3 years ago | (#36266354)

Erlang (and other functional, single-static-assignment languages) are perfect for parallel programming. Sure, you can do parallel programming in C or Ruby or Python, but you need to be very careful about side effects. And the compiler doesn't know enough, so you need to manage all that shit yourself. Analogy time: If you were on the prowl for some vagina, you could try a woman, a ladyboy, or a man. Which one would you choose?

Use a practical tool (1)

mangu (126918) | more than 3 years ago | (#36267784)

Erlang (and other functional, single-static-assignment languages) are perfect for parallel programming.

Okay, then the only problem is getting something useful out of Erlang.

Back in 1985 the Japanese government announced a "fifth generation" computing project, with software to be developed in Prolog. So I went and learned Prolog, an intriguing and amusing language. Only problem is, it was totally useless for any actual application, as the Japanese found out.

Sorry, but in order to believe any of the promises of one of those non-vonNeumann languages, I have to see a practical working application first.

Multithreading some problems is (probably) hard (1)

betterunixthanunix (980855) | more than 3 years ago | (#36266356)

There is a class of problems, the P complete problems, which are (probably) inherently hard to multithread in any advantageous way, and this class includes some pretty important real-world problems:

http://en.wikipedia.org/wiki/P-complete [wikipedia.org]

Re:Multithreading some problems is (probably) hard (1)

TaoPhoenix (980487) | more than 3 years ago | (#36266592)

I'll reply to you.

You're a bit smarter than me but I think you're saying there are lots of easy tasks that are easy to run on a single linear code thread but are hard to split and recombine with any less loss of time / resources.

PLC programmers have been doing this for years... (1)

superkurty (582998) | more than 3 years ago | (#36266362)

PLC code for automated mechanical systems runs into all of the same timing issues etc. Obviously the PLC code is much smaller but this is mostly due to the instructions acting on smaller data sets, but can still be just as complicated,.

Re:PLC programmers have been doing this for years. (0)

Anonymous Coward | more than 3 years ago | (#36266478)

Are you kidding?

A PLC runs the same code over and over again in an endless loop. This is a language that does not even have _ANY_ sort of mutexes, semaphores, or threads. Hell, most PLCs don't even have local variables, or even function arguments!

Many PLCs let you configure interrupt handlers, but there is no way to synchronize them with the main loop.

Re:PLC programmers have been doing this for years. (0)

_0xd0ad (1974778) | more than 3 years ago | (#36266622)

Mutexes are just bits that are turned on or off. Semaphores are integers. Local variables are memory locations that aren't used by the rest of the program (or the rest of the program may use them freely and their value need not be retained from scan to scan). Function arguments are just memory locations that are written by the rest of the program before calling the subroutine, which reads them and uses their values.

The only of those that I actually haven't used in a PLC program are semaphores.

Re:PLC programmers have been doing this for years. (0)

_0xd0ad (1974778) | more than 3 years ago | (#36266698)

No wait - I take that back; I've used semaphores too.

Re:PLC programmers have been doing this for years. (1)

superkurty (582998) | more than 3 years ago | (#36267746)

This is probably why many programmers experience such difficulty programming (efficiently) PLCs.... Lots of very large, very complicated programs are just "the same code running endlessly". The devil is in the inputs.

unaware? WTF? (1)

Nadaka (224565) | more than 3 years ago | (#36266370)

I am kinda curious how anyone even tangentially involved in programming could not be aware that the problem with writing parallel programming was doing it for a gain in efficiency. Making a thread or process is generally just a couple lines of code, synchronization with data separation, mutex's and avoiding deadlocks and race conditions has been solved since almost the beginning of parallelism.

Re:unaware? WTF? (5, Insightful)

Mongoose Disciple (722373) | more than 3 years ago | (#36266416)

synchronization with data separation, mutex's and avoiding deadlocks and race conditions has been solved since almost the beginning of parallelism

And yet people constantly get these details wrong in practice.

It's an extra layer of complexity and it introduces extra chances to make mistakes, even around areas where a programmer could know better. There's not much way around that. If people only made coding mistakes around difficult problems software would be dramatically more bulletproof than it actually is.

Re:unaware? WTF? (1)

Hooya (518216) | more than 3 years ago | (#36266562)

I tend to think of it as an extra dimension in code. With non-parallel code, the code you have (it's sequence) is the same as what it's sequence would be when run. With parallel code, the run-time sequence is different than the code as it's laid out in source.

I see people have trouble with just async stuff (eg. AJAX) and have a hard time wrapping their mind around the fact that even though the callback function is in-sequence with the rest of the code, it's not actually called in that sequence - hence the 'callback'. Now, go full tilt with parallel code, and the heads start to spin since the code you're looking at can be run completely out of sequence of where it appears in the call graph.

The easiest time i've had with parallel programming was with erlang.

Re:unaware? WTF? (1)

johanatan (1159309) | more than 3 years ago | (#36266772)

That's the same way I like to think of 'higher-order programming' (with or without threads)-- an extra degree of freedom.

Re:unaware? WTF? (1)

dgatwood (11270) | more than 3 years ago | (#36266930)

I see people have trouble with just async stuff (e.g. AJAX) and have a hard time wrapping their mind around the fact that even though the callback function is in-sequence with the rest of the code, it's not actually called in that sequence - hence the 'callback'.

In my experience, the reason people have so much trouble with async stuff is that every single JavaScript API I've seen that does things asynchronously is rather fundamentally designed *wrong*. They don't provide any clean mechanism for passing data structures with variables from the calling function into the callback short of constructing a helper function on the fly. Since most people have a hard time wrapping their head around the concept of creating anonymous functions, many programmers end up using global variables. This works until things happen out of order or they need to use that code in parallel. Then, things break miserably, and the entire mechanism falls flat.

Also, JavaScript overuses callbacks. If you can't guarantee that other things won't happen in the DOM tree before the callback fires anyway, it would be a much saner design to simply declare that "this synchronous call may block, and while this call is blocked, other activity may occur". The only thing callbacks do in JavaScript is force you to do a crapton of extra work to pass local variables into the callback. This callback-heavy design causes a lot of extra work for developers, while providing exactly zero benefit. It's not like JavaScript is raw code running in a native thread or anything—the main program loop isn't JavaScript code; it's part of the interpreter—so it's almost exactly as easy for the interpreter to save off the function state and restore it transparently as it is for that interpreter to handle a return statement and trigger a callback at a particular point.

Compared with JavaScript, parallel programming using sane languages and sane APIs is trivial. Contrast the JavaScript custom function kludge with the much cleaner block syntax used by Apple, or even plain old C callbacks with refCon/userData parameters, and it's easy to see why people hate writing asynchronous code in JavaScript. It's not that writing asynchronous code is inherently hard, but rather that writing asynchronous code in a language that goes out of its way to make it hard, and using an API set that similarly goes out of its way to make it hard is inherently hard.

Re:unaware? WTF? (0)

Anonymous Coward | more than 3 years ago | (#36267720)

Javascript 1.7 has generator functions, which can "yield" (return) in the middle of a function and resume at the same spot. This can be used to build coroutines, although it's still not entirely natural since the caller has to know how to deal with generators. Also JS 1.7 is currently only supported by Firefox.

Re:unaware? WTF? (1)

Nadaka (224565) | more than 3 years ago | (#36266594)

I blame that mostly on the languages not being explicit about what operations and methods are or are not thread safe. And for capable programmers those errors generally only occur when you try to make things more efficient by avoiding excess mutex and data duplication in the pursuit of efficiency.

Re:unaware? WTF? (0)

Anonymous Coward | more than 3 years ago | (#36266820)

I blame that mostly on the languages not being explicit about what operations and methods are or are not thread safe.

And that is why we are still doing parallel programming in Fortran 95. Any good F95 compiler would have caught the bug mentioned in the program. The language standard only allows constructs and procedures/subroutines inside parallel constructs that are explicitly declared "thread save" (elemental/pure). And the criteria for that are well defined and get checked at compile time.
Saved my ass countless times.

Re:unaware? WTF? (1)

Anonymous Coward | more than 3 years ago | (#36266694)

synchronization with data separation, mutex's and avoiding deadlocks and race conditions has been solved since almost the beginning of parallelism

And yet people constantly get these details wrong in practice.

It's "Simon Says" programing. You can't just say "raise your right hand", you constantly have to remember to say "Simon says raise your right hand".

As anyone who has played the game can attest, in the midst of keeping track of all the outlandish directions it's easy to forget the distinction between "raise your right hand" and "Simon says raise your right hand". The human brain can only handle so much at once. Tacking on an additional things to keep track of just makes things harder.

Re:unaware? WTF? (1)

Anonymous Coward | more than 3 years ago | (#36266710)

I think you misunderstand what was said.

A bad programmer (hint: most programmers are bad programmers) can just as easily write unreliable software with only 1 thread. In this sense, it's understandable that many programmers can't write stable multi-threaded programs. On the other hand, a disciplined and experienced software engineer who understands synchronization can churn out lots of code that runs safely in multiple threads. Sure, they will be scratching their heads on the occasional race condition, but that's just analogous to forgetting to zero out a dangling pointer, or whatever; it may be trickier to reproduce and therefore diagnose than some bugs, but it's otherwise just like any other bug.

The real problem with parallel programming that I see (and I think this is what the commenter your're replying to is getting at with the first sentence or so) is that it can be hard to make the parallelism give you an actual speedup. You can look at a problem and say, "Ah ha! I am CPU bound! More cores will alleviate this!" Then it turns out after implementation to not be much gain, or your work queues and synchronization become a bottleneck, or whatever. Doing it safely is not a huge problem. Doing it well is.

Re:unaware? WTF? (0)

Anonymous Coward | more than 3 years ago | (#36266590)

I am kinda curious how anyone even tangentially involved in programming could not be aware that the problem with writing parallel programming was doing it for a gain in efficiency. Making a thread or process is generally just a couple lines of code, synchronization with data separation, mutex's and avoiding deadlocks and race conditions has been solved since almost the beginning of parallelism.

These problems have definitely not been solved in the general case - they have been solved for a subset of parallel programming problems.

Re:unaware? WTF? (1)

jd (1658) | more than 3 years ago | (#36266994)

Yes they have. It's called Critical Path Analysis.

Moral of the story... (2)

kvvbassboy (2010962) | more than 3 years ago | (#36266378)

learn2efficientparallelalgorithms.

Good tutorial for someone who wants to jump into some parallel programming, but it's mostly Operating Systems 101 (or 601).

Honestly though, if you have not optimized your algorithm or code to for parallelism and you want to do it now, you might probably be better off writing the whole thing from scratch, and the tutorial explains why very nicely/.

Re:Moral of the story... (1)

betterunixthanunix (980855) | more than 3 years ago | (#36266430)

Assuming that there is a good way to take advantage of multiple processors for the particular problem your code solves. For a course project once, we found ourselves trying to solve a linear program, and someone suggested using the research cluster to speed things up. As it turns out, linear programming problems are P-complete, and there is no known way to make meaningful use of multiple cores (it is very likely that no such method even exists, and that the problem is inherently sequential).

Re:Moral of the story... (0)

Anonymous Coward | more than 3 years ago | (#36266534)

Even with inherently sequential problems, you can use parallelization to pipeline the solution and increase throughput ( as opposed to latency.) Very rarely are computers used to perform a task just once, so this works nicely.

Re:multiple processors (1)

TaoPhoenix (980487) | more than 3 years ago | (#36266728)

I'll take a stab and reply to you that "8 processors was enough for anyone", in the sense that multiplexing 8 programs is just insane. Better to just run 8 prorams each on their own core, and use some progs that can use 4 cores at a time. That leaves 4 free.

(Overly simpistic) I agree, but 1028 cores is not the answer. We need the next generation in raw core power to move computing forward. 8 killer cores will beat 1024 mediocre cores.

Depends on the Problem... (1)

Anonymous Coward | more than 3 years ago | (#36266392)

Not all tasks are suited for parallel programming. Some are and can be highly optimized, but if you have something that simply can't be reduced further then parallel programming will not solve your problems.

Re:Depends on the Problem... (0)

Anonymous Coward | more than 3 years ago | (#36266682)

Yep. For example, a ladyboy has a mouth and an ass, so "she" can please two cocks at once. A real woman also a vagina, so she can please three cocks at once. But a man only has one cock, so he can only please one orifice at a time.

Re:Depends on the Problem... (0)

Anonymous Coward | more than 3 years ago | (#36266730)

It would have been a much nicer example if he had gained any performance from his multi-threading. Getting a negligible speedup for all that effort and 4 cores doesn't a compelling case make. The missing analysis, of course, is that the task is memory bandwidth limited, and so the 4 cores are sitting there waiting for their requested cache lines most of the time. Doing that analysis *before* investing in multi-threading is pretty important.

Re:Depends on the Problem... (1)

Bengie (1121981) | more than 3 years ago | (#36267384)

I've seen one example of a threaded serial task. Intel has HyperThread optimization PDF some where showing some interesting tricks using HT. The example they had was on an i7, so fairly recent. They had a serial task that iterated through an array. They loaded an extra thread that synced with the primary thread, ran it on the other virtual thread, and all it did was call the prefetch instruction on the array.

Even though they had a modern architecture with advanced prefetching and linear memory access, using the prefetch command on a separate thread on the same core increased the performance by 30% over the working thread doing the prefetching. The biggest issue with this approach is making sure the second thread doesn't get too far ahead of the working thread as too much prefteching will start to cause cache evictions and make it slower.

Corner case, I know, but still interesting.

I would assume something simple like this would do the above. quick napkin code, mind any errors

WorkerThread:
volatile int i;
for(i = 0; i y) max = y;

for(int i = x; i max; i++)
prefetch(array[i]);

sleep(1);
}

no locking involved, and the preftech thread never goes more than 1024 elements ahead.

This is kind of a bad example in that there is little computational work being done and 'i' would have to be volatile because if the value of i was stored in only a register during the loop, the other thread would never see the increments. But if gives the idea, being only pseudo-code.

Re:Depends on the Problem... (1)

Bengie (1121981) | more than 3 years ago | (#36267408)

uhhggs.. ignore the code.. something happened and some of the code that previewed didn't show up.

Obvious department of obviousness (0)

johnwbyrd (251699) | more than 3 years ago | (#36266394)

Jebus in a sidecar, Slashdot; do you think we've all never heard of parallel programming before? OpenMP has been around for ten years now. Yeah, you can't write a for loop in C and expect magically parallelization. Seriously, is there ANYONE here who doesn't know this already?

If you're trying to dumb down your article content to the point where it annoys any vaguely experienced programmers... well, you're succeeding.

What? (1)

YodasEvilTwin (2014446) | more than 3 years ago | (#36266444)

This hasn't been news for 30+ years. Parallel programming is hard for a multitude of well-known technical reasons, this is nothing new. Another major reason is that the human brain is no good at parallel programming; again, nothing new. I say let's either try to solve the problems or move on.

Re:30 years (1)

TaoPhoenix (980487) | more than 3 years ago | (#36266834)

"The news of the problems of parallelization is still news after 30 years"
- paraphrase of Christian saying

A broader problem that also lacks formalization (1)

amn108 (1231606) | more than 3 years ago | (#36266454)

In my opinion, many problems with software development, are just as applicable in other domains of our life, and parallel programming is definitely one of them. We equally well have problems managing large teams of people working in parallel. These are problems of logistics, management and also (and no I am not joking) - cooking. And we're as bad handling these as we now handle software development. It may be right, however, to start solving this with the computers - no need to throw away rotten food and/or burned-out employees under delusional management.

What could help is a trusted set of formal theories. That would be a start. Whether it will be a mathematician, a computer programmer, or a kitchen chef that writes this set, is not important. Stripped bare of the bathwater - abstracted and proven - the baby will be what we need.

Re:A broader problem that also lacks formalization (1)

betterunixthanunix (980855) | more than 3 years ago | (#36266508)

There has been quite a bit of work on formalizing parallel computing. NP problems are exactly that: problems that can be solved efficiently on a computer that can explore an unbounded number of solution paths in parallel. There is also the NC hierarchy, which can be thought of as problems that can be solved efficiently on a sequential computer and "much more efficiently" on a parallel computer (that is, polynomial time on a sequential computer, and polylogarithmic time on a parallel computer with a polynomial bound on the number of processors).

Re:A broader problem that also lacks formalization (0)

Anonymous Coward | more than 3 years ago | (#36266870)

I've often thought that a good approach to parallel programming would be a language that represented something like a PERT chart. You spell out the resource dependencies, the language enforces them (statically checks that your resources are not used incompatibly in two places that you have declared parallelizable), and then the compiler spits out code for n-processors. You would also need a way to spell out a pipelined operation. Bonus points for being able to hint at (or, better, infer) critical path.

Obviously, you can do all these things with thread primitives, but somehow the syntax doesn't match up with the PERT graph in my head. Also, there is nothing forcing you to write things in a thread safe way. Really, that static checking is key for reliability.

If I had the ability to finish any project I ever started, I would try to create such a language, but I'm too much of a slacker. Probably easier to get my head around a functional language.

Re:A broader problem that also lacks formalization (1)

amn108 (1231606) | more than 3 years ago | (#36267458)

That's a monumental and exceedingly involving task - to create such a novel concept of a language, partly because few have been written, and even fewer are usable to mere mortals, who happen to be using imperative languages just fine, mind you. Articles upon articles full of either terms-within-terms or lack there-of, have been written. Some people once set upon defining this new computing upon which you seem to touch and created www.tunes.org, which has since stagnated.

I would advise you to write down your ideas and always fragment the problems that you identify and/or abstract them to a point where you can again spot the beginning of a formal theory that you can publush. That way you will be nearing the solution you often cannot see with predictable, verifiable and not least visually available and memorable steps. Also, that way, even if you don't come within a reach of the final ultimate goal which is a working language, you will have developed a whole set of usable laws and theories that can help others in their quest for the same goal. If you know you can't reach the goal, at least be a gentleman and pave a good road for those that find your path later.

"What Makes Parallel Programming Difficult?" (2, Insightful)

Anonymous Coward | more than 3 years ago | (#36266456)

That you have to do everything all at once. How would you tell 50 kids to sort 50 toy cars? How would you tell 50 footballers to line up by height all at once? How would you have 50 editors edit the 50 pages of a screenplay all at once so that it makes sense from a continuity perspective? All these problems are very easy, but become very hard when you have to do it all at once...

Re:"What Makes Parallel Programming Difficult?" (1)

Short Circuit (52384) | more than 3 years ago | (#36266744)

The ultimate solution to most of those appears to be, "have to do it fifty times, and assign one job to a person".

That said, not all problems are so easily dealt with as bulk operations. Perhaps you have some real need to take one item and spit the work up among multiple people. The real driving force for all the scenarios I can think of is a desire or need for an upper bound on job latency.

I do this kind of work as my day job. I've also got some experience in managing groups of coders (at work, not particularly via the site linked to in my sig). The problems are remarkably related. You know that job you had where there were meetings held once a day to synchronize between multiple groups? You know how you hated that those meetings waste your time? That's the kind of overhead that parallel programmers weigh and try to avoid when writing their code.

Who knows? Perhaps "lockless" algorithms will find some particular use in real-world management problems.

Re:"What Makes Parallel Programming Difficult?" (0)

Anonymous Coward | more than 3 years ago | (#36266922)

>Perhaps "lockless" algorithms will find some particular use in real-world management problems.

No, the secretary is the lock, the thing that makes sure that the hotel you're fucking your mistress in is not the same one you're spending your wedding anniversary with your wife at. There will never be a "lockless" time when you can fuck all three (mistress, wife, secretary) at once.

Re:"What Makes Parallel Programming Difficult?" (1)

Nadaka (224565) | more than 3 years ago | (#36267036)

You can if they swing that way.

Re:"What Makes Parallel Programming Difficult?" (1)

jbolden (176878) | more than 3 years ago | (#36267440)

How would you tell 50 footballers to line up by height all at once?

Easy break into n subgroups have them line up by height and then perform pairwise merges.

Lame (3, Interesting)

oldhack (1037484) | more than 3 years ago | (#36266494)

I bet there are hundreds of simple tutorials on concurrent processing on the web, and these two pieces bring nothing new/interesting.

Re:Lame (0)

Anonymous Coward | more than 3 years ago | (#36266766)

Even more, I didn't see a single mention of FPGA programming, a naturally concurrent environment. Perhaps they should look a little closer to the metal to find tried solutions to bring back to high level.

We are dudes. (-1)

Anonymous Coward | more than 3 years ago | (#36266580)

We don't do parallel. Girls are much better at that, but what self-respecting girl want's to programm...

a fix (1)

gargeug (1712454) | more than 3 years ago | (#36266596)

Well, we just learned about this in a graduate comp arch course and yeah, it can get hairy. Especially if using a processor consistency model as opposed to the sequential consistency model. The easiest fix is to throw up some fence instructions around the interdependent code sections to force sequential consistency, and then lay some flags as a means of time signaling between the processors. This eliminates the randomness that the author discussed in one of the examples.

Chapel (2)

Warlord88 (1065794) | more than 3 years ago | (#36266652)

I think the language Chapel [cray.com] being developed by Cray is taking concrete steps to make Parallel Programming easier. It is very similar in syntax to C++ and even supports some higher level constructs.

This looks like an advertisement for Chapel but I have no relation to Cray. Having taken a graduate parallel programming course, I cannot agree more with the statement that "Parallel Programming is difficult". I struggled a lot with pthreads and MPI before doing the final assignment in Chapel which was a pleasant surprise. The difference between serial and parallel code was FIVE characters only - and it gave a near linear speedup on three different machines.

Parallelisation of serial code is the problem (1)

GreatDrok (684119) | more than 3 years ago | (#36266718)

20 years ago when I was working with transputers we use Occam. It was a very pure parallel programming language and it wasn't too difficult. However, writing parallel code meant starting from scratch (getting rid of the dusty decks of old algorithms as my professor described it). However, this never really happened and we've ended up with primitive parallelisation nailed on to sequential code. The are many parallel architectures, SIMD, MIMD, distributed memory, shared memory and combinations of them all and none of the languages currently available really suit. You have basic multithreading and MPI bolted on to old sequential languages and developers trying to take algorithms which absolutely must proceed in order and trying to find sections they can perform in parallel. This isn't going to get you good performance or scalability. In Occam, we wrote procedures which were independent of each other and communicated over channels and all would be running at once. You wired them together and poured you data in at one end and results came out at the other. Due to the very fine grained parallelism it was extremely scalable - I hade code running on 80+ transputers at over 90% efficiency in 1990 and it would run happily on many more if I had them, and yet it also ran perfectly on one since serialisation of parallel programs is easy whereas parallelising serial code is very hard and inefficient. I find it sad that the current state of parallel programming is still so far behind where we were 20 years ago due to all this legacy stuff. Even when people start out with new algorithms, they still start with serial processes and then try and parallelise rather than considering the parallelism from the outset and designing the algorithm so it doesn't require everything in memory and doesn't produce different results when number of CPUs changes.

Re:Parallelisation of serial code is the problem (1)

jd (1658) | more than 3 years ago | (#36267064)

Occam still exists. KROC supports the latest version of Occam (Occam-Pi), which supports mobile processes, generics and other concepts acquired from the developments in serial programming and computer clusters.

I consider it to be one of the finest languages out there for learning parallel programming and consider that most of the modern failures in the field are a result of people not knowing it and therefore not knowing the fundamentals. You can't run if you can't walk.

The transputer was the ultimate in computer cluster technology - a serial line is all it took. No switches, no routers, no expensive hardware, just raw mesh topology. It was a very sad day when it was sold to ST Electronics. It still exists, but is used only in multimedia appliances these days. A bit of a comedown for a chip that threatened to crush Cray.

Re:Parallelisation of serial code is the problem (1)

lucian1900 (1698922) | more than 3 years ago | (#36267414)

Erlang has most of the observable properties of Occam, so it's not all lost.

The problem is that people still insist on using mutable shared state in concurrent programs. If only they stopped doing silly things like that...

C.A.R. Hoare (0)

Anonymous Coward | more than 3 years ago | (#36266902)

Isn't this what Hoare formally described in "Communicating Sequential Processes"?

As a parallel computing programmer (0)

Anonymous Coward | more than 3 years ago | (#36266934)

...I have found that the strategy chosen for coordinating activity among and communicating between running bits of software on a parallel system usually causes most of the problems. Some blessed lead guy or (god forbid) manager will declare, "We are going to use ____" [threads | json over web services | CORBA | EJB | enterprise service bus | ...] and the project is stuck with that paradigm come hell or high water. Believe it or not, all parallel computing problems are not created equal. And it is often not until an engineering team has spent time puzzling out the data and process interactions before the best way to set up the solution becomes clear. Parallel systems magnify the liabilities of poor and/or rigid technology choices.

Also, the herd mentality of layering stacks of frameworks atop other stacks of virtual machines is a recipe for disappointment. For truly performance-critical applications you want to physicalize, not virtualize. You want to stop treating a cluster of boxes as a general-purpose computing resources and start treating it as a custom-crafted parallel appliance tailored for your problem domain. You want to get your knuckles dirty with hardware characteristics and then tailor your logic and data structures such that they can be pinned to NUMA nodes (for example) and utilize the caches and locally-attached RAM for the highest benefit.

Less dependence upon stacks of frameworks, more understanding your problem and tailoring and hardware and software solution to fit it more efficiently.

Efficiency is hard (4, Informative)

tlhIngan (30335) | more than 3 years ago | (#36266940)

Using locks and the like make it very easy to do multithreaded and parallel programs.

The big problem comes when you need multiple locks because you find your program is waiting more on locks than anything else which is gumming up the whole works, and that can easily lead to deadlocks and other fun stuff.

Another way is to consider lockless algorithms, which don't have such blocking mechanisms. However, then you get into issues where atomicity isn't quite so atomic thanks to memory queues and re-ordering done in the modern CPU, and thus have to start adding memory barriers before doing your atomic exchanges.

Raymond Chen (of Microsoft) did a nice write up of the lockfree ways to do things and what Windows provides to accomplish them.

http://blogs.msdn.com/b/oldnewthing/archive/2011/04/05/10149783.aspx [msdn.com]
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/06/10150261.aspx [msdn.com]
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/06/10150262.aspx [msdn.com]
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/07/10150728.aspx [msdn.com]
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/08/10151159.aspx [msdn.com]
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/08/10151258.aspx [msdn.com]
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/12/10152296.aspx [msdn.com]
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/13/10152929.aspx [msdn.com]
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/14/10153633.aspx [msdn.com]
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/15/10154245.aspx [msdn.com]
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/19/10155452.aspx [msdn.com]
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/20/10156014.aspx [msdn.com]
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/21/10156539.aspx [msdn.com]
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/22/10156894.aspx [msdn.com]

Re:Efficiency is hard (1)

lucian1900 (1698922) | more than 3 years ago | (#36267468)

The problem is using mutable shared state. If you change one of those properties (mutable private state and immutable shared state), writing correct and efficient programs suddenly becomes much easier.

Re:Efficiency is hard (2)

Prune (557140) | more than 3 years ago | (#36267686)

Worse, typical synchronization primitives such as in pthreads and Windows are optimized not for speed but for error handling etc. It is easy to beat their speed with custom implementations, sometimes with dramatic speed increases (see for example numerous articles at http://locklessinc.com/articles/ [locklessinc.com] [locklessinc.com] and I've implemented some of the mutexes and ticket locks and my Windows/Linux software has become faster as a result). In addition, using lock-free algorithms whenever possible can provide a further performance enhancement (various algorithms and optimizations at www.1024cores.net), but along with implementing those goes an assumption of understanding of memory consistency semantics (acquire, release, consume, weak ordering, strong consistency, etc.) in order to minimize implied or explicit memory barriers, as well as thread local storage tricks in order to minimize data dependencies, the latter becoming all the more important as the number of cores increases and memory necessarily becomes less uniform in access times. Add to this the need to use cache-aware and/or cache-oblivious algorithms, and it is very clear that optimization in a modern hardware environment is anything but simple, and moreover cannot be anything but simple!

Now what? How to actually monitor and debug? (1)

aharth (412459) | more than 3 years ago | (#36267142)

Parallel programming is hard. Bummer. I'd rather be interested in an article that talks about *monitoring and debugging* parallel programs (I currently struggle to monitor parallel algorithms implemented in Java). Anybody?

The answer is... (1)

drb226 (1938360) | more than 3 years ago | (#36267194)

that most of today's popular programming languages do not accommodate higher-level forms of expression required for easy parallelism. Declarative languages have a slight edge at being able to express where sequential dependencies are.

Functional programming (1)

Logic and Reason (952833) | more than 3 years ago | (#36267264)

One more reason why functional programming matters. Many programs become trivial to parallelize when you avoid mutation and side-effects outside of limited, carefully-controlled contexts.

It's truly a joy when you can parallelize your code by changing a single character (from "map" to "pmap"). There's usually a little more to it than that, but you almost never have to deal with explicit locks or synchronization.

Re:Functional programming (0)

Anonymous Coward | more than 3 years ago | (#36267422)

Agree completely, functional programming is a key enabling paradigm for the parallel world. I think thats why so many .NET developers are stoked with the PLINQ [microsoft.com] extensions... it just makes life so easy.

Re:Functional programming (1)

jbolden (176878) | more than 3 years ago | (#36267462)

Agreed. Learning to isolate side effects is one of the best things a programmer will get from FP.

Re:Functional programming (1)

Aater Suleman (2206330) | more than 3 years ago | (#36267846)

I agree with that but they should only be used after you know whats underneath the hood. If we "teach" parallel programming that way then the programmers will never understand the underlying challenges. It will become all magic to them like JAVA is. See my post on this very topic: http://www.futurechips.org/thoughts-for-researchers/csee-professors-graduates-understand-computers.html [futurechips.org]

Side-effects (0)

Anonymous Coward | more than 3 years ago | (#36267306)

Side-effects are the pantomime villain of parallel programming.

This has been known since the late fifties.

The fact nobody has given a shit until recently is not surprising, but I don't have much sympathy either.

Poor choice of language (1)

jbolden (176878) | more than 3 years ago | (#36267374)

What makes parallel programming hard is poor languages. Languages that allow state changes and don't keep them isolated. Isolate changes of state, all changes of state and be careful about what kinds of combinators to use. Google map-reduce works whenever

a) You can organize your data into an array
b) You can do computations on array element in isolation
c) You can do computations on the entire array via. associate operations pairwise

And most programs do meet those properties but they slip in all sorts of subtle state changes so out of order execution isn't possible. The key to parallelism is better language selection.

Parallel programming *NOT* widely needed (3, Interesting)

ljw1004 (764174) | more than 3 years ago | (#36267536)

The dirty secret of parallel programming is that it's *NOT* so widely needed. I think a lot of academics got funding to study automatic parallelization or other parallel techniques, and they latch on to multicore as a justification for it, but it's not.

There is only one GOOD reasons to use multithreading -- because your work is compute-bound. This typically happens on large-data applications like audio/video processing (for which you just call out to libraries that someone else has written), or else on your own large-data problems that have embarrassingly trivial parallelization: e.g.

var results = from c in Customers.AsParallel() where c.OrderStatus="unfilfilled" select new {Name=c.Name, Cost=c.Cost};

Here, using ParallelLINQ, it's as simple as just sticking in "AsParallel()". The commonest sort of large-data problems don't have any complicated scheduling.

There are also BAD reasons why people have used multithreading, particularly to deal with long-latency operations like network requests. But this is a BAD reason, and you shouldn't use multithreading for it. There are better alternatives, as shown by the Async feature in F#/VB/C# which I worked on, which was also copied into Javascript with Google's traceur compiler). e.g.

Task task1 = (new WebClient()).DownloadStringTaskAsync("http://a.com");
Task task2 = (new WebClient()).DownloadStringTaskASync("http://b.com");
Task winner = await Task.WhenAny(task1,task2);
string result = await winner;

Here it kicks off two tasks in parallel. But they are cooperatively multitasked on the same main thread at the "await" points. Therefore there is *NO* issue about race conditions; *NO* need to use semaphores/mutexes/condition-variables. The potential for unwanted interleaving is dramatically reduced.

So in the end, who are the people who still need to develop multithreaded algorithms? There are very few. I think they're just the people who write high-performance multithreaded libraries.

Seriously? (0)

Anonymous Coward | more than 3 years ago | (#36267582)

What kind of imbecile posted this? Did he/she think that optimising should be simpler then just getting them to run? Besides, if you are unfamiliar with parallel code debugging, perhaps you are not the best person to judge what is enlightening and what is not. Try selling used cars instead.

Honestly... (1)

emanem (1356033) | more than 3 years ago | (#36267586)

...very simple article...was expecting something more technical!
Anyway, cheers for the effort to write it!

Ps. Looks like he discovered an open secret :-P

Re:Honestly... (1)

Aater Suleman (2206330) | more than 3 years ago | (#36267778)

...very simple article...was expecting something more technical! Anyway, cheers for the effort to write it! Ps. Looks like he discovered an open secret :-P

In my defense, I wasn't targeting parallel programming experts:-) The theme of my post was to familiarize people who don't know parallel programming with the challenges.

Re:Honestly... (1)

emanem (1356033) | more than 3 years ago | (#36267990)

Fair play mate!

I guess you won't be explaining the cache line size issue and memory alignment too, right? :-)
As I said, thumbs up at least for the effort!

Cheers!

Ps. Questions for Java experts, how does the JVM (the best implementation btw) deal with cache line size and memory alignment issues? Is there any optimization in this area?

The problem: too much data sharing (1)

Animats (122034) | more than 3 years ago | (#36267622)

The basic problem with parallel programming is that, in most widely used languages, all data is by default shared by all threads. C, C++, and Python all work that way. The usual bug is race conditions.

There have been many languages for parallel programming which don't have default sharing, but they've never taken over outside some narrow niches. Partly because most of them weren't that useful outside their niche.

The other classic problem is that in most shared-data languages with locks, the language doesn't know what the lock is protecting. So you can still code race conditions by accident.

Re:The problem: too much data sharing (1)

Aater Suleman (2206330) | more than 3 years ago | (#36267740)

The basic problem with parallel programming is that, in most widely used languages, all data is by default shared by all threads. C, C++, and Python all work that way. The usual bug is race conditions.

There have been many languages for parallel programming which don't have default sharing, but they've never taken over outside some narrow niches. Partly because most of them weren't that useful outside their niche.

The other classic problem is that in most shared-data languages with locks, the language doesn't know what the lock is protecting. So you can still code race conditions by accident.

Hey, my concern is that there is lots of code where I do want to assume sharing (since its easier to think that way for some problems). MPI is kind of like what you are saying and programming in MPI is not much fine either.

Re:The problem: too much data sharing (1)

lucian1900 (1698922) | more than 3 years ago | (#36267944)

The problem is the combination of mutable and shared state. If state is either immutable shared or mutable private, everything is much easier.

Things get even more interesting with persistent immutable data structures, like Clojure's.

Scala has added parallel collections and has Akka (1)

MCRocker (461060) | more than 3 years ago | (#36267928)

The Scala [scala-lang.org] community has tried to move the problem into a more practical realm by adding things like parallel collections [infoq.com] , DSL's [scala-lang.org] to abstract out the problem for specific applications and the Akka Project [akka.io] for simpler concurrency.

Most of the parallel programming discussion I've seen is very complicated and not likely to appeal to those who have to do practical day-to-day business projects. By pushing the abstractions up a level, I think the Scala folks have made parallel programming more accessible for the average developer.

Ref: Parallel Collections video [scala-lang.org] .

It's not. (0)

Anonymous Coward | more than 3 years ago | (#36267996)

Seriously. Parallel programming IS NOT hard. I'm tired of people complaining about this; in 90% of cases, it's trivially easy. Like most things in programming, there are some hick ups in a small fraction of what you do, but that's just part of the game. Parallel computing isn't somehow magically more difficult than programming in general. Dear God.

Unless of course we're talking about dealing with terribly written bolt on libraries for circa 1980s programming languages that should not be in use in the modern era. That can be hard -- but that's the fault of old technology being expected to handle new problems and a lack of investment in creating new tools in the form of system-level languages...it has little or nothing to do with the underlying challenges of parallel programming itself.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?
or Connect with...

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>