Beta

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

An Overview of Parallelism

kdawson posted more than 7 years ago | from the cores-galore dept.

Programming 197

Mortimer.CA writes with a recently released report from Berkeley entitled "The Landscape of Parallel Computing Research: A View from Berkeley: "Generally they conclude that the 'evolutionary approach to parallel hardware and software may work from 2- or 8-processor systems, but is likely to face diminishing returns as 16 and 32 processor systems are realized, just as returns fell with greater instruction-level parallelism.' This assumes things stay 'evolutionary' and that programming stays more or less how it has done in previous years (though languages like Erlang can probably help to change this)." Read on for Mortimer.CA's summary from the paper of some "conventional wisdoms" and their replacements.
Old and new conventional wisdoms:

  • Old CW: Power is free, but transistors are expensive.
  • New CW is the "Power wall": Power is expensive, but transistors are "free." That is, we can put more transistors on a chip than we have the power to turn on.

  • Old CW: Monolithic uniprocessors in silicon are reliable internally, with errors occurring only at the pins.
  • New CW: As chips drop below 65-nm feature sizes, they will have high soft and hard error rates.

  • Old CW: Multiply is slow, but load and store is fast.
  • New CW is the "Memory wall" [Wulf and McKee 1995]: Load and store is slow, but multiply is fast.

  • Old CW: Don't bother parallelizing your application, as you can just wait a little while and run it on a much faster sequential computer.
  • New CW: It will be a very long wait for a faster sequential computer (see above).

cancel ×

197 comments

Sorry! There are no comments related to the filter you selected.

would that be (5, Funny)

macadamia_harold (947445) | more than 7 years ago | (#17967086)

Mortimer.CA writes with a recently released report from Berkley entitled "The Landscape of Parallel Computing Research: A View from Berkeley

Would that be a Parallelograph?

Re:would that be (2, Funny)

Alwin Henseler (640539) | more than 7 years ago | (#17967324)

No, more likely a Berks Eye View.

nothing new here... (3, Informative)

Anonymous Coward | more than 7 years ago | (#17967166)

pretty much the same thing Dave Patterson's been saying for a while now...in fact, the CW sounded so familiar, I went back to double check his lecture slides from more than a year ago:

http://vlsi.cs.berkeley.edu/cs252-s06/images/1/1b/ Cs252s06-lec01-intro.pdf [berkeley.edu]

and it's pretty much identical (check out slide 3 on the first page of the pdf)

Erlang (5, Insightful)

Anonymous Coward | more than 7 years ago | (#17967176)

Erlang only provides a way of proving parallel correctness, a la CSP. This means avoiding deadlocks and such. The primary difficulty of crafting algorithms to run efficently over multiple CPUs still remains. Erlang does not do any automatic parallelization, and expects the programmer to write the code with multiple CPUs in mind.


I'm wating for a language which would parallelize stuff for you. This is most likely to be a functinal language, or an extension to an existing functional language. Maybe even Erlang.

Is it a compiler problem or an os problem? (1, Interesting)

Anonymous Coward | more than 7 years ago | (#17967410)

Most of the time a computer isn't churning away on a single problem that needs to be parallelized. In that respect, the solution rests more with the operating system. Something like a server could relatively easily make use of almost any number of processors; one per client maybe. The trouble comes with bus contention. It is a problem similar to designing a subnet. If you have too many computers/processors, communications grind to a halt. In other words, adding processors slows things down rather than speeding them up.

The solution to adding more processors may be architectural. Adding more busses, ala harvard architecture for example, might be an effective approach. This is the approach used in DSPs which are a lot more powerful on a per-cycle basis than the more conventional architecture.

Harvard architecture since i486 (0)

tepples (727027) | more than 7 years ago | (#17967594)

Most of the time a computer isn't churning away on a single problem that needs to be parallelized.
Perhaps a server can be parallelized, but what about a home computer?

Adding more busses, ala harvard architecture for example, might be an effective approach.
There is already a Harvard architecture inside PC-class CPUs since the i486, as the level 1 cache is split into separate sections with separate buses for instructions and data.

Re:Is it a compiler problem or an os problem? (2, Insightful)

Doctor Memory (6336) | more than 7 years ago | (#17968850)

Most of the time a computer isn't churning away on a single problem that needs to be parallelized. In that respect, the solution rests more with the operating system.
True, and in the short term I'd imagine that's where most of the improvements will appear, but context switches are expensive. It's more efficient if you can switch execution to another thread within the same context, so you can (hopefully) still use the data in your I & D caches, although you do still have a register spill/reload.

Speaking of architecture changes, it sounds like Intel is going down the same road the Alpha team did — more cacheing. I remember reading an article about one system DEC made (ISTR this was about the time the 21264 came out) that had 1M of L1 cache, 2M of L2, and 8M of L3. I wonder how much cache they could squeeze onto a chip, given current power handling.

Re:Erlang (1)

LLuthor (909583) | more than 7 years ago | (#17967440)

Erlang also has a huge amount of overhead, and due to immutable data structures, has to spend a lot of time copying data around.

This is the problem inherent with pure languages. Compilers/runtime systems are simply not sophisticated enough yet to reason about and perform as well as a human can with mutable data structures.

This is why C/C++ will dominate scalable applications for a at least few years more.

Re:Erlang (2, Informative)

grammar fascist (239789) | more than 7 years ago | (#17967834)

Erlang also has a huge amount of overhead, and due to immutable data structures, has to spend a lot of time copying data around.

This is the problem inherent with pure languages. Compilers/runtime systems are simply not sophisticated enough yet to reason about and perform as well as a human can with mutable data structures.

Haskell comes pretty close, and it's designed from the beginning to be pure.

In fact, it may be these immutable data structures that make the pure functional languages able to perform well on multiple processors where sequential languages with mutable data structures fail miserably. If you don't have to worry about function's side effects and no specific execution order is guaranteed, you can run it on any processor you want whenever you want, as long as you don't hold up things too much that depend on it.

However, pure functional languages need to be made more palatable to your average Joe Programmer. (Or computer scientists need to get more mathematical. Or both.) We need the Python and Ruby of pure functional languages: languages in which the syntax design is regarded as a user-interface problem.

Java invokeLater() (1)

Latent Heat (558884) | more than 7 years ago | (#17968062)

In Java, you can have threads communicate by passing objects that you treat as immutable -- a thread creates an object, passes it as an argument to invokeLater(), where it gets synchronized with the event queue, acted upon, and then garbage collected. That the thread that called invokeLater() on an object no longer retains any references to that object is a matter of programming discipline, but that style of multi-threaded programming is available.

The question or concern I have is that if one programs threads that way, communicating by creating, passing, and garbage collecting objects, is the garbage collector happy with this, or does this create more overhead than you save with the thread parallelism (on multi core of course)?

Re:Java invokeLater() (0)

Anonymous Coward | more than 7 years ago | (#17969546)

What? didn't anyone ever tell you it uses magic?

Re:Erlang (1)

zsau (266209) | more than 7 years ago | (#17968348)

Surely functional programming languages are where the idea of syntax design as UI problem came from! Lisp on the one hand has a very simple yet incredibly powerful syntax; whereas Haskell has an incredibly beautiful syntax with more sugar than you can dream of. Neither have the same kind of hard-and-fast distinction betwee built-ins and user-defined structures of others either; I can just as easily define an infix function or a macro that doesn't expand its third argument in Haskell or Lisp (respectively) as I can define a regular all-expanding prefix function.

What Haskell needs is to change the name of the IO monad to "warm fuzzy thing" or somesuch and to give it a syntax a bit more like regular procedural languages. And also to improve its efficiency.

Also, more on topic, is Haskell, or any other lazy pure functional language, actually capable of automatically parallelising code, or is that just some theoretical idea that will need a lot of research & work before it's possible? I didn't think Haskell used multiple processors...

Re:Erlang (1)

grammar fascist (239789) | more than 7 years ago | (#17968914)

Surely functional programming languages are where the idea of syntax design as UI problem came from! Lisp on the one hand has a very simple yet incredibly powerful syntax; whereas Haskell has an incredibly beautiful syntax with more sugar than you can dream of.

Okay, I'll be more specific: they need to address syntax as a human user interface design problem.

Lisp. UI. Heh. Lisp syntax is a good user interface if sendmail.cf is. Its near lack of consistent visual cues disqualifies it completely.

Haskell is a good UI for mathematicians, who spend much of their lives learning to not think like normal people. Computer scientists are somewhere between normal people and mathematicians. Target them.

Re:Erlang (1)

neckjonez (731991) | more than 7 years ago | (#17967628)

see the Sun Fortress project http://fortress.sunsource.net/ [sunsource.net] for a language with built-in parallelism...It's a beginning, anyway.

Re:Erlang (2, Insightful)

CastrTroy (595695) | more than 7 years ago | (#17967740)

I wrote some parellel code using MPI [lam-mpi.org] in university. It takes a lot of work to get the hang of at first, and many people who I know that were good at programming had lots of trouble in this course, because programming for parallelism is very different than programming for a single processor. On the other hand, you can get much better performance from parallel algorithms. However, I think that we could do just as well sticking with the regular algorithms, and having a lot of threads each running on a different core. If you look at an RDBMS, it would be nice if you could sort in less than n log(n) time, but it's even better if you just sort in n log (n), but can run 128 sorts simultaneously. I seem to remember some news about Intel saying they would have 128 core chips available in the near future.

Re:Erlang (5, Informative)

cpm80 (899906) | more than 7 years ago | (#17968946)

I think using a *specific* language for automatic parallelization is the wrong way. Some GNU folks are working on language independent autoparallelization for GCC 4.3. Their implementation is an extension to the OpenMP implementation in GCC. Read OpenMP and automatic parallelization in GCC, D. Novillo, GCC Developers' Summit, Ottawa, Canada, June 2006 http://people.redhat.com/dnovillo/Papers/ [redhat.com] for details.

It's not hard (4, Insightful)

PhrostyMcByte (589271) | more than 7 years ago | (#17967204)

I think the main reason people say "don't use threads" is because while single threaded apps are easy to debug, multi-threaded ones will crash and burn at seemingly random places if the programmer didn't plan ahead and use proper locking. This is probably good advice to a noob programmer but I otherwise can't stand people who are of the "absolutely, never, ever, use threads" mindset.

Some applications have no need to be multithreaded, but when they do it is a lot easier than people make it out to be. Taking advantage of lock-free algorithms and NUMA for maximum scalability *can* be hard, but the people who need these will have the proper experience to tackle it.

Language extensions for threading would be great, and I'm sure somebody is working on it. But until that magical threading language (maybe c++1x) comes along the current ones work just fine.

Re:It's not hard (0)

Anonymous Coward | more than 7 years ago | (#17967272)

Anyone know of any that try and modify existing programs to run well in parallel

Re:It's not hard (1)

QuesarVII (904243) | more than 7 years ago | (#17967542)

The Intel C/C++ and Fortran compilers both support auto parallelization. I've never benchmarked their effectiveness though. I'm sure a quick googling could get some results.

Re:It's not hard (3, Interesting)

ardor (673957) | more than 7 years ago | (#17967302)

Well the indeterministic nature of multithreading is still a problem. With one thread, debugging is simple: the only thread present will be stopped. But with multiple threads, how is the debugger supposed to handle the problem? Stop all threads? Only the current one? Etc. This is relevant when debugging race conditions.

Also, the second great problem is that thread problems are hard to find. When I write a class hierarchy, an OOP language can help me with seeing design errors (for example, unnecessary multiple inheritance), or misses in const-correctness. Threading, however, is only present as mutexes, conditions etc.

One other issue with threads is that they effectively modify the execution sequence. Traditional single-threaded programs have a sequence that looks like a long line. Threading introduces branches and joins, turning the simple line into a net. Obviously, this complicates things. Petri nets can be useful in modeling this.

Re:It's not hard (1)

cluckshot (658931) | more than 7 years ago | (#17967578)

Its just opinion but I think the problem with parallel programming is a substantial lack of processors. If for example a die was made with 10 million processors on it (very simple processors) with modest queue memory, there are applications in optical data processing that would become extremely efficient and much more natural than any solutions as of this time. Otherwise we just spend our time loading the queue of one or 4 processors millions of times to do exactly the same process on the data from each pixel in a camera for example.

Having 4 processors is not parallel processing, it is just divided serial processing. Parallel processing allows that the process is in fact done in parallel rather than simply divided into 4 pipes. The thinking that is dominating the processor study at this time is one of how do we do these "serial" processes faster. Duh! We are going to parallel processes here folks! It requires thinking in a whole different paradigm. It isn't serial processing of course the guys with serial brains are not going to get the process figured out.

Multi-threading is merely serial operations done with time interrupts and conditional interrupt flags. It is by definition serial. The parent post isn't bad, it begins to get there, but really we have to get out of that mind set. Parallel will operate like a single serial operation set applied in several million sets at the same time without interrupts. Parallel programming will have many other features including a reconstruction of base process step after the parallel is done. To illustrate, if you were to do a simple operation such as comparing one video frame with another in real time, you could compare with a corresponding number of processors every pixel in one clock cycle. You could at the end of that cycle instruct another set of operations such as a time decayed comparison of many frames.

Re:It's not hard (1)

maxwell demon (590494) | more than 7 years ago | (#17967782)

Not all algorithms can be parallelized that easily. Imagine e.g. a parser: You cannot parse text by having a million processors looking at one character each.

Re:It's not hard (3, Insightful)

mikael (484) | more than 7 years ago | (#17967998)

Not all algorithms can be parallelized that easily. Imagine e.g. a parser: You cannot parse text by having a million processors looking at one character each.

You could have the first thread processor split the text by white space. Then each block of characters is assigned to any number of processors to find the matching token. I've seen some parsers where the entire document was
read in, converted into an array of tokens before returning back to the calling routine.

Re:It's not hard (1)

vadim_t (324782) | more than 7 years ago | (#17968158)

But in that case, your speed is limited by the speed of the splitting by whitespace. And if you've got lots of CPUs, chances are they're quite slow on their own.

Then there's that you can hardly split by whitespace with a complex syntax. You certainly won't be able to parse C like that.

Re:It's not hard (1)

gumbi west (610122) | more than 7 years ago | (#17968462)

I think this is a great example of where serial mind sets inhibit us. Parsing in parallel makes a whole lot of sense once you're ready for it. Granted you need parallel data storage to make good on parallel parsing, but it's feasible if you are ready to restructure your whole mind set.

There is a very interesting theory provided by Trefethen and Bau in Numerical Linear Algebra that the existence of exact algorithms for most of the complicated interesting matrix operations may have slowed our finding a faster algorithm that approximates the result with arbitrary (i.e. slightly non-zero) error. It's been several years, but I thought that eigenvalue problems when approximated to within a few times machine epsilon can be found 1/2 order faster or some such thing than using the exact methods.

Re:It's not hard (1)

ringm000 (878375) | more than 7 years ago | (#17969336)

A parser is useless by itself. Usually you parse data to process it somehow, and processing the data (e.g. code generation in a compiler) can often also be split into separate passes. These passes can communicate rather efficiently through a lock-free queue.

Re:It's not hard (1)

tepples (727027) | more than 7 years ago | (#17967608)

I think the main reason people say "don't use threads" is because while single threaded apps are easy to debug, multi-threaded ones will crash and burn at seemingly random places if the programmer didn't plan ahead and use proper locking.
Do the C standard and the C++ standard go into detail as to the semantics of this locking? No, they don't even specify what a thread is. This makes it less likely that students learn proper locking in college.

Language extensions for threading would be great, and I'm sure somebody is working on it. But until that magical threading language (maybe c++1x) comes along the current ones work just fine.
Which "current ones"? Do POSIX threads work on Microsoft Windows?

Re:It's not hard (1)

maxwell demon (590494) | more than 7 years ago | (#17967884)

Do the C standard and the C++ standard go into detail as to the semantics of this locking?

AFAIK C++0x (i.e. the next version of the C++ standard) will.

Re:It's not hard (1)

jlarocco (851450) | more than 7 years ago | (#17968642)

Do the C standard and the C++ standard go into detail as to the semantics of this locking? No, they don't even specify what a thread is. This makes it less likely that students learn proper locking in college.

I don't think that's relevant. The ISO standard for Ada completely specifies the semantics of threads and thread locking, but that doesn't necessarily make the problem any easier. Of course Ada usually isn't taught in schools, but (unfortunately) C and C++ aren't usually taught in schools either, except for an occassional "C++ Programming" class.

Which "current ones"? Do POSIX threads work on Microsoft Windows?

I'm not exactly sure what you mean. There are several languages that have parralelism built into the language itself. In most of them, the underlying OS and computer architecture are irrelevant. Ada and Erlang come to mind, but there are others.

Re:It's not hard (1)

FranklinDelanoBluth (1041504) | more than 7 years ago | (#17969780)

Language extensions for threading would be great, and I'm sure somebody is working on it. But until that magical threading language (maybe c++1x) comes along the current ones work just fine.
Which "current ones"? Do POSIX threads work on Microsoft Windows?

The real application domain of multi-threading and data-level parallelism is stuff that one would never run on a Windows client: web servers, databases, scientific computing, ray tracing, etc. The improvement that these new paradigms offer is mostly in terms of throughput, not response time.

I would also add that to program on many of the new multi-core architectures, a programmer will want to be able to specify what runs on which core. Take for example the cell processor, where each SPE has only a 256KB local store with limited branch prediction. A miss in the store or a mispredicted branch would be catastrophic for performance. Instead, the programmer writes small specialized programs with that have minimal branching. To construct larger systems, programs on adjacent (note: adjacency is key!) SPEs pass their output down an assembly line of sorts, each adding a little bit of work to the output. To effectively utilize this system, a programmer will want very tight control over the assignment of threads to SPEs.

Even though modern desktops running XP/Vista are running multi-processor machines(Core 2s and 64 X2s), these are still quite far from state of the art in term of exploiting multi-threading. If you want to see what it's really about, look at the Sun T1: 8 cores, 4 physical threads per core.

If you are looking to speed up single-threaded applications (i.e. most of the stuff that consumers use) you should be looking to fight the "Memory Wall." Most of todays computers (even with the newly reduced pipelines) still spend most of their time stalling on cache misses (it's on the order of 10^2 cycles to retrieve from RAM). Looking for ways to maximize cache locality/residence would be much more fruitful in terms of turn around time, though again this tends to be something which must be done explicitly and cannot be abstracted behind a programming language.

Re:It's not hard (1)

MBCook (132727) | more than 7 years ago | (#17967680)

Maybe we should teach the basics. I don't remember my CS program (this was '01-'05 or so) teaching anything about working with threads. Everything I know about threads and threaded programming has been picked up by reading books on threading, experimenting with threading, and what I've learned on my job from others who know threading. I'm not going to claim I'm that competent through.

What do I know about NUMA and other complex issues of thread handling and modern computers? What I know in that category came from reading about the Linux Kernel and following it's development. Articles explaining what they were doing and why (especially on LWN.net) have taught me a TON.

But the interfaces to program threads in C/C++ are external libraries or fork. Java has threads built in but it still feels like quite a bit of work for something that should be simpler. People talk about Erlang, but I don't know how much better it is (I've never used it).

The debugging instruction I was taught in school was basically what was necessary to help us get our programs done. I've learned quite a bit more since getting my job. And god forbid they teach us profiling or something. Let alone would they try to teach us about mutli-threaded programming or ESPECIALLY debugging it.

Maybe one of the reasons we so few multi-threaded programs (besides making algorithms parallel isn't easy) is how many people are actually trained how to do it.

You don't even how to teach students how to do all this stuff, just give them a basic knowledge so they can keep an eye towards it in the future when they need to learn more, instead of the sink-or-swim learning situation they may get stuck in.

Parallelizing Compilers (3, Insightful)

jd (1658) | more than 7 years ago | (#17967912)

This problem was "solved" (on paper) in the mid 1970s. Instead of writing a highly complex parallel program that you can't readily debug, you write a program that the computer can generate the parallel code for. Provided the compiler is correct, the sequential source and the parallel binary will be functionally the same, even though (at the instruction level) they might actually be quite different. What's more, if you compile the sequential source into a sequential binary, the sequential binary will behave exactly the same as the parallel version (only much slower).

Any reproducable bug in the parallel binary will be reproducable given the same set of inputs on the sequential binary, which you can then debug as you have the corresponding sequential source code.

So why isn't this done? Automagically parallelizing compilers (as opposed to compilers that merely parallelize what you tell them to parallelize) are extremely hard to write. Until the advent of Beowulf clusters, low-cost SMP and low-cost multi-core CPUs, there simply haven't been enough machines out there capable of sufficiently complex parallelism to make it worth the cost. Simply make a complex-enough inter-process communication system, with a million ways to signal and a billion types of events. Any programmer who complains they can't use that mess can then be burned at the stake for their obvious lack of appreciation for all these fine tools.

Have you ever run GCC with maximum profiling over a program, tested the program, then re-run GCC using the profiling output as input to the optimizer? It's painful. Now, to parallelize, the compiler must automatically not just do one trivial run but get as much coverage as possible, and then not just tweak some optimizer flags but run some fairly hefty herustics to guess what a parallel form might look like. And it will need to do this not just the once, but many times over to find a form that is faster than the sequential version and does not result in any timing bugs that can be picked up by automatic tools.

The idea of spending a small fortune on building a compiler that can actually do all that reliably, effectively, portably and quickly, when the total number of purchasers will be in the double or treble digits at most - say what you like about the blatant stupidity rife in commercial software, but they know a bad bet when they see one. You will never see something with that degree of intelligence come out of PCG or Green Hills - if they didn't go bankrupt making it, they'd go bankrupt from the unsold stock, and they know it.

What about a free/open source version? GCC already has some of the key ingredients needed, after all. Aside from the fact that the GCC developers are not known for their speed or responsiveness - particularly to arcane problems - it would take many days to compile even SuperTuxKart and probably months when it came to X11, glibc or even the Linux kernel. This is far longer than the lifetime of most of the source packages - they've usually been patched on that sort of timeframe at least once. The resulting binaries might even be truly perfectly parallel, but they'd still be obsolete. You'd have to do some very heavy research into compiler theory to get GCC fast enough and powerful enough to tackle such problems within the lifetime of the product being compiled. Hey, I'm not saying GCC is bad - as a sequential, single-pass compiler, it's pretty damn good. At the Supercomputer shows, GCC is used as the benchmark to beat, in terms of code produced. The people at such shows aren't easily impressed and would not take boasts of producing binaries a few percent faster than GCC unless that meant a hell of a lot. But I'm not convinced it'll be the launchpad for a new generation of automatic parallelizing compilers. I think that's going to require someone writing such a compiler from scratch.

Automatic parallelization is unlikely to happen in my lifetime, even though the early research was taking place at about the time I first started primary school. It's a hard problem that isn't being made easier by having been largely avoided.

It is hard (1)

MtHuurne (602934) | more than 7 years ago | (#17968118)

I don't say "never use threads", but I do say "don't use threads unless you have to". Bugs in threaded code are easier to introduce, are hard to see in code reviews, are very likely to remain undetected in manual testing, are likely to remain undetected in automated testing and are a pain to reproduce.

Part of the problem is the inherent difficulty of parallelism, but there is more to it than that. Threads are not isolated in any way: they can take unpredictable paths through the program and touch any data structure. It's a good idea to design your threaded program in such a way that each thread sticks to a small section of the code and data, but this is a structure that you have to impose yourself: the threading mechanism does not help here.

So, in my opinion threads are hard. Sometimes you have to do hard things though. But if you can avoid it, you can save yourself a lot of effort and risk by sticking to a single thread. I think the situation can be compared to memory allocation: noob programmers are sure to mess up explicit allocation/deallocation; experienced programmers get it mostly right, but spend a lot of effort on it and still make mistakes from time to time (especially in the error handling paths). Garbage collection avoids a lot of problems.

For the future, I think we should have some kind of mechanism for parallelism other than threads. It would have to isolate the parallel computations from each other and provide a safe and efficient way to pass control and data between them. I read a bit about Erlang when it was mentioned here a couple of days ago, but it is not yet clear to me if/how this could be fitted into an object oriented language. That would be essential to get it to the masses: functional programming hasn't become mainstream because it doesn't match the way most programmers think.

Re:It is hard (1, Insightful)

Anonymous Coward | more than 7 years ago | (#17968260)

"don't use threads unless you have to" should read: "don't use ANY arbitrary feature unless you have to" i.e. KISS.

Re:It is hard (0)

Anonymous Coward | more than 7 years ago | (#17969816)

"don't use threads unless you have to" should read: "don't use ANY arbitrary feature unless you have to" i.e. KISS.

That's why I never use comments!

Seriously, all features are not created equal. Threads have a relatively high cognitive burden.

Re:It's not hard (2, Interesting)

owlstead (636356) | more than 7 years ago | (#17968340)

For a new C++ with multithreading language extensions (for manually coding the multithreading), good API and IDE/tool support, look no further. It's called Java and it has been around for ages. You really, really, really don't want multithreading in a non-"managed" language. You don't want to debug an application where *every* part of your application can be messed up by any thread. The advantage of Java is that the build in security meassurements.

Things you need to have support for in a language/environment:
- All the usual multithreading design patterns in the API (producer/consumer, stream support, stacks etc);
- Thread safe collections;
- A keyword and mechanism to do locking ("synchronized" in Java, "volatile" is handy for optimizations);
- Memory protection;
- Multi-threading supporting debuggers (at least halt all functionality)
- Very well documented API, especially if classes are thread safe and how to use them;
- If anyway possible, automated code checking tools for multi-threading (won't catch every multi-threading error, but are very usefull).
Furthermore it's very important to make sure you can use the API's on different platforms if you expect to port the application in the future. Also, since threading may be expensive, a lightweight version of threads may be usefull as well.

It's not just adding a language extension for threading. Support must go much further than that. Especially the API documentation is important in this respect. Unfortunately there are always parts of the API that are not really usefull for multithreading, such as the Date classes in Java (they suck in all other aspects as well, to be honest).

If you are every going to try Java, make sure to not use arrays too much. You cannot prevent write access to arrays, so any thread that has a reference to an array may change it all the time. Try and use immutable classes as much as you can (BigInteger, BigDecimal and of course Strings are all immutable), and make defensive copies when needed. In C++ any thread can mess up any data element by simply casting, something that is done *way* to much in C++ anyway. You can also use C#, but a quick google search scan shows that C# multithreading is not used as much as can, and getting support might be much more difficult. At least you can make the collections thread safe.

Re:It's not hard (1)

drolli (522659) | more than 7 years ago | (#17969932)

> Language extensions for threading would be great, and I'm sure somebody is working on it. But until that magical threading language
> (maybe c++1x) comes along the current ones work just fine.

The sad reality is: unless there is an AI there will be no "magical threading language". That is because designing a multi-threaded program starts with isolating object classes and instances which are *likely* to be used for communication. Isolation and designing an "orthogonal" flow of information is for multi-threaded code much more essential tha for single threaded. If designed correctly on each of the information flows one can use the standard techniques to protect against most of the problems. As the problem goes, the reality is not ideal. A lot of code has extraordinarily badly designed internal communication mechanisms, which gives three possibilities

1) Redesign the code (Um, yes...)

2) Accept design flaws a fix the crashes until they are not annoying any more

3) Accept that while your program may be multi-threaded effectively only 1.0 to 1.5 threads are running at the same time due to your "global locks", and make similar to 2) the probalility of deathlock small enough to be non-annoying

BTW.: I do not condider 2) and 3) to be good programming

Hmmm... (5, Interesting)

ardor (673957) | more than 7 years ago | (#17967216)

"but is likely to face diminishing returns as 16 and 32 processor systems are realized"

Then we are doing something wrong. The human brain provides compelling evidence that massive parallelization works. So: what are we missing?

Re:Hmmm... (1)

Tibor the Hun (143056) | more than 7 years ago | (#17967268)

Granularity.
Neurons only work on miniscule problems. Pretty much like a single gate.
Our programs are not yet capable of dividing problems into such small pieces.

Plus, so far most of our parallelization efforts are focused on optimizing software. Brains are dedicated wetware machines.

Re:Hmmm... (1)

ardor (673957) | more than 7 years ago | (#17967340)

In this case, the current parallelization efforts miss the point. Is there actual research into CPUs consisting of billion miniscule neuron-like units? Something like a neural net hardware? Maybe these would fare better than pure software ones...

Re:Hmmm... (1)

Tibor the Hun (143056) | more than 7 years ago | (#17967364)

I haven't heard of any hardware such as that, but I'd guess it'd be possible to do with FPGAs.
Surely someone here could enlighten us.

Re:Hmmm... (1)

Simon80 (874052) | more than 7 years ago | (#17967614)

One of my university profs does neural network research with FPGAs, perhaps that qualifies.

Re:Hmmm... (1)

colganc (581174) | more than 7 years ago | (#17967274)

I doubt it has anything to do with what we are missing as opposed to optimizing for what kind of hardware exists currently. I imagine current parallel techniques have relatively speaking low overhead and to take advantage of computers with high processor accounts you move to higher overhead methods that can take advantage of more of the processing power.

Re:Hmmm... (4, Funny)

CosmeticLobotamy (155360) | more than 7 years ago | (#17967362)

Then we are doing something wrong. The human brain provides compelling evidence that massive parallelization works. So: what are we missing?

Brain scalability is just not that great. Trust me, putting more than four brains in one head is just asking for locking problems out the yin-yang.

Re:Hmmm... (1)

jimmydevice (699057) | more than 7 years ago | (#17967570)

Putting four brains in the same *room* is asking for problems.

Re:Hmmm... (1)

edschurr (999028) | more than 7 years ago | (#17967456)

Isn't the brain pretty much entirely in the domain of imprecise pattern recognition? Maybe massive parellelism isn't useful for many types of tasks. I don't have the imagination to think about it right now.

Well, one thing that I can think of is having extra threads gamble on what problems might need solving in the future, and doing them ahead of time. Diminishing returns still feels very relevant...

Re:Hmmm... (1)

imsabbel (611519) | more than 7 years ago | (#17967564)

Yeah.
You are missing that our super-great brain isnt even able to calculate more than 1 or 2 additions per second.
Meaning: the parallism thats at work in our brain is absolutely useless for thinks we want to do with a computer.

Re:Hmmm... (1)

ardor (673957) | more than 7 years ago | (#17967742)

As mentioned before, it is all a matter of how the problem is viewed. If you can translate the addition into an abstract visual model, then the brain is very quick.

Re:Hmmm... (0)

Anonymous Coward | more than 7 years ago | (#17968370)

At the abstract level of conscious thought, addition is "slow", yes. But your brain calculates things quicky all the time and is massively parallel. You simply do not have awareness of such low-level calculations inside your head. That's what [probably] allows for consciousness.

Re:Hmmm... (1)

gumbi west (610122) | more than 7 years ago | (#17968534)

yet a baseball player can calculate the exact force needed to throw a baseball from center field to home plate. Here the angle and force are even overspecified and so there is a highly non-linear optimization to perform to get the minimum possible travel time to the throw. There is also an error in angle and force so a loss-function must be used to trade off accuracy for speed, in addion how important the throw is which dictates how probable an injury should be allowed. If you had a machine that let you throw the ball with all these factors, it could easily take you an hour to do the calculation. Yet the baseball player takes only a fraction of a second to do it all.

So how long does it take to do the calculation?

Re:Hmmm... (1)

timeOday (582209) | more than 7 years ago | (#17969108)

yet a baseball player can calculate the exact force needed to throw a baseball from center field to home plate.
No he does not. He simply learns the mapping from stimuli to actions. He's not calculating anything. That's why pitchers pitch instead of studying physics books.

Re:Hmmm... (2, Interesting)

philipgar (595691) | more than 7 years ago | (#17967632)

Is this really true? Of course for some tasks the massive parallelism of the human brain works great. The brain can analyze complex images extremely fast, comparing them in parallel to it's own internal database of images, using fuzzy reasoning to detect if something is familiar etc. However you give your brain complex math problems, and it can spend seconds, minutes, or even hours to solve it, sometimes requiring extra scratch memory to solve. This is due to bad programming in the brain that sucks at doing certain computations (except for the rare people who manage to do these problems fast).

This is also true for computers. Big scientific problems can use up 1000's of processor's, just look at some of the problems running on supercomputers. However not all problems scale up to effectively use 1000's or processors. Many desktop applications won't be able to take advantage of the power, however many of these applications don't need to take advantage of that power.

Basically computers will continue to evolve, with custom hardware or cores that can adapt to the problem. If there's enough demand to solve a problem, it can be done, however doing fast software designs will not be cheap, and "hardware" designs (verilog, vhdl, etc) will be even more expensive (even if they don't need to spin silicon for them). However for applications that have sufficient demand, they will be done. They're just highly time-intensive to do. Luckily there will always be outsourcing to help us out.

Phil

Re:Hmmm... (1)

ardor (673957) | more than 7 years ago | (#17967712)

Yes, the brain works visually, and this is the problem with math. Mathematicians often say that they imagine the mathematical problems in a visual way, but so abstract that it cannot be painted on paper. So the actual problem is the translation.

Re:Hmmm... (1)

zsau (266209) | more than 7 years ago | (#17969168)

However you give your brain complex math problems, and it can spend seconds, minutes, or even hours to solve it, sometimes requiring extra scratch memory to solve.

You give an x86 computer PPC binaries and ask it to run them and they'll be slow. Your complex maths problems aren't in your brain's natural format. But anyone (experienced with driving a particular car) can tell you just how hard they'll need break to stop, or how fast they can go round that corner safely. And they can do this whilst telling their kids in the back to shut up! These are pretty complex maths problems. Even simpler ones can be done in no time at all: Uneducated people who run market stalls can calculate change from $50 of sixteen figs very quickly, yet you ask them what 50-(cost-of-fig * 15) is, and they'll take a lot longer.

The brain is massively parallel; it's just that attention is more limited, and it takes attention to process abstract problems. And, of course, your brain has been optimised for particular tasks. We can still learn a lot from it when solving different problems

(Also, the computations your brain can do is a separate issue from its memory. Your short-term memory is only very limited---sevenish items, but of almost arbitrary size ("123632445634579861" is hard to store in STM, but "123 632 244 563 579 861" is much easier), whereas your long-term memory is much slower less useful when you're performing computations.)

Re:Hmmm... (1)

LordLucless (582312) | more than 7 years ago | (#17968320)

It depends. Our brain's are vastly more complicated than a computer CPU, but they're not designed to do the same thing. There was an interesting explanation of this in a fiction book I read a while back that talked a little about AI.

Basically, it said the human brain was designed for algorithmic complexity, not speed. If you take a brain, and make it ten times faster, it won't be any more intelligent. It'll just take 1/10th of the time to get the same answer as it used to, it still won't be able to solve any more complex problems.

What we do with computer processors is the opposite direction - we just pile up more and more speed. When we want complicated behavior from a CPU, through more speed at it brute-force it. If we want our next-gen parallelized machines to be efficient, we'll have to make changes to our algorithms.

LabVIEW and Parallelism (4, Insightful)

SparhawkA (608965) | more than 7 years ago | (#17967236)

Take a look at LabVIEW, a compiled graphical programming language from National Instruments. It natively supports SMP / multicore / multithreading. Essentially, dissociated pieces of code you write (computations, hardware I/O, etc.) are automatically scheduled in separate threads of execution in order to maximize efficiency. It's an interesting idea: here's a technical article from their website that does a better job of describing it (some marketing included as well): http://zone.ni.com/devzone/cda/tut/p/id/4233 [ni.com]

Re:LabVIEW and Parallelism (1)

Doppler00 (534739) | more than 7 years ago | (#17967636)

As someone who has used LabVIEW extensively I have to say it kind of sucks. They claim "self documenting" and things like that. If you've ever seen a real LabVIEW program, you'll see that it's a mess, and very difficult to understand and debug. It doesn't help that it hasn't been Object Oriented until version 8. You STILL can't spawn 0 to N threads, it's built in library support is very week (simple things like list manipulation, queues, string processing, etc. don't exist!).

It's a niche product meant for data acquisition systems in experimentation.

And you're also likely to have sequential sections of code even with labview. This isn't really a language problem so much as it is an algorithms problem. There are just too many iterative algorithms that require a fast CPU to execute things sequentially.

Re:LabVIEW and Parallelism (0)

Anonymous Coward | more than 7 years ago | (#17967728)

That's because it's essentially a functional language.

I believe this to be a contradiction in the conventional wisdom of programming languages:
- functional languages are hard! let's stick with Algol-derived languages
- threads are hard! avoid them if at all possible
- but multiprocessing is practically free in a functional language

(Do I sound bitter? Sorry. My coworkers seem to have made it through college CS programs without ever taking a course in FP, or correctness proofs, so I spend too much time straining spaghetti code.)

Re:LabVIEW and Parallelism (1)

the_brobdingnagian (917699) | more than 7 years ago | (#17967794)

Labview itself would not be useful for high performance computing because it's made for data acquisition. The concept of programming with blocks and wires forces you to split your "code" in independent pieces which can be executed independently. Maybe an add-on to Labview or just a similar language written for high performance computing could make it very easy to create great parallel software.

From TFA (5, Funny)

obender (546976) | more than 7 years ago | (#17967240)

The target should be 1000s of cores per chip
640 cores should be enough for anyone.

Erlang: The Movie (2, Funny)

djberg96 (133496) | more than 7 years ago | (#17967264)

In case any has missed it: http://video.google.com/videoplay?docid=-583031888 2717959520 [google.com]

I can't wait for the sequel!

Re:Erlang: The Movie (0)

Anonymous Coward | more than 7 years ago | (#17968506)

Hello Joe, Hello Mike.

Classic stuff.

Express What, Not How (3, Informative)

RAMMS+EIN (578166) | more than 7 years ago | (#17967312)

I think parallelism can be achieved elegantly using languages that express what is to be done, rather than how it is to be done. Functional programming is a major step in the right direction. Not only do functional programs typically more clearly express what is to be done (as opposed to which steps are to be taken to get there), they also tend to cause fewer side effects (which restrict the correct evaluation orders). In particular, not using variables avoids many of the headaches traditionally involved in multithreading.

Re:Express What, Not How (2, Insightful)

ardor (673957) | more than 7 years ago | (#17967386)

Functional languages are no silver bullet, however. Things like I/O do not fit well in there. Yes, there are solutions for this, but they tend to be overly complicated. A hybrid functional/imperative language with safeguards for side-effects of the imperative parts seems to be the way to go.

Re:Express What, Not How (1)

MajroMax (112652) | more than 7 years ago | (#17967476)

A hybrid functional/imperative language with safeguards for side-effects of the imperative parts seems to be the way to go.

You just described Common Lisp.

Re:Express What, Not How (1)

Eli Gottlieb (917758) | more than 7 years ago | (#17967548)

Where does Common Lisp protect side-effects?

Re:Express What, Not How (1)

The_Wilschon (782534) | more than 7 years ago | (#17968160)

I suppose by usually having two different functions, one which works by side effects, and one which does the same thing except without side effects of any kind. Scheme does an even better job of this with the bang convention (adding a ! to the end of function name to denote that this version is destructive).

Not sure if that is what you mean, and it probably isn't quite good enough, but it is perhaps the right direction?

Re:Express What, Not How (1)

WaffleKing16 (565438) | more than 7 years ago | (#17967746)

http://haskell.org/ [haskell.org] No side effects and it has a great way of dealing with IO. Plus the newest version of GHC [haskell.org] has some great new features for writing parallel programs.

Re:Express What, Not How (1)

Dorceon (928997) | more than 7 years ago | (#17968996)

Church devised the Lambda Calculus and was largely ignored. Then Turing devised the Turing Machine. Once everyone accepted the Turing Machine, Turing proved that it was equivalent to the Lambda Calculus. In other words, functional programming wasn't a step in the right direction--procedural programming was a step in the wrong direction!

massively parallel CPUs (0)

Anonymous Coward | more than 7 years ago | (#17967334)

Right, massively parallel computing doesn't work? And, this is from a research institution? And, research institutions don't use massively parallel computers? Right.

Hmmm, seems to me that some cpu makers, just like car makers, would like to continue selling you legacy hardware. I am not buying. Commoditized massively parallel cpus are a step in the right direction. This is the way that RAM memory is sold today. The same works with CPUs. In fact, I would go a little further and combine the two, CPU with RAM, in a single commodity. $5 per cpu means I can buy hundreds of cpus for the same price as that single monolithic legacy cpu that you have little choice about today. ie. The desktop computer company let's you choose the amount of RAM, and size of hard disk, but they don't let you choose which CPUs. I think that this is all about to change.

Sure, desktop apps may get diminishing returns... (1)

GoldTeamRules (639624) | more than 7 years ago | (#17967444)

But, service based applications will soar. As a web developer, I'm thrilled by the prospect of being able to run my web application on 16 and 32 core machines. I could care less about whether office apps run any more quickly.

Re:Sure, desktop apps may get diminishing returns. (1)

gbjbaanb (229885) | more than 7 years ago | (#17967496)

This isn't bringing parallelisation to your web app for free, instead you're getting 1 cpu just liek you used to have. The difference is that the server can handle many more users, with new limits of memory and disk i/o, of course.

Never underestimate the virus writers. (1)

DrYak (748999) | more than 7 years ago | (#17968236)

run my web application on 16 and 32 core machines. I could care less about whether office apps run any more quickly.

Never underestimate the huge quantity of virus/trojans/spywares/sony rootkits/spambots/etc. pollution that joe 6-pack will be running next to his power hungry Microsoft OS and his main office work.

16/32 cores machines could be useful for him too, so his main work (browsing for pr0n ?) won't slowdown for being constantly interrupted by the multitasking scheduler to let the huge pile of crap run.

---

1 core for Office.
4 cores for the "WoW effect".
11 remaining cores pumping spam.

1 "Intel Core Pento 16 cores" to rule them all.

It's a supercomputer perspective (5, Insightful)

Animats (122034) | more than 7 years ago | (#17967546)

I just heard that talk; he gave it at EE380 at Stanford a few weeks ago.

First, this is a supercomputer guy talking. He's talking about number-crunching. His "13 dwarfs" are mostly number-crunching inner loops. Second, what he's really pushing is getting everybody in academia to do research his way - on FPGA-based rackmount emulators.

Basic truth about supercomputers - the commercial market is zilch. You have to go down to #60 on the list of the top 500 supercomputer [top500.org] before you find the first real commercial customer. It's BMW, and the system is a cluster of 1024 Intel x86 1U servers, running Red Hat Linux. Nothing exotic; just a big server farm set up for computation.

More CPUs will help in server farms, but there we're I/O bound to the outside world, not talking much to neighboring CPUs. If you have hundreds of CPUs on a chip, how do you get data in and out? But we know the answer to that - put 100Gb/s Ethernet controllers on the chip. No major software changes needed.

This brings up one of the other major architectural truths: shared memory multiprocessors are useful, and clusters are useful. Everything in between is a huge pain. Supercomputer guys fuss endlessly over elaborate interconnection schemes, but none of them are worth the trouble. The author of this paper thinks that all the programming headaches of supercomputers will have to be brought down to desktop level, but that's probably not going to happen. What problem would it solve?

What we do get from the latest rounds of shrinkage are better mobile devices. The big wins commercially are in phones, not desktops or laptops. Desktops have been mostly empty space inside for years now. In fact, that's true of most non-mobile consumer electronics. We're getting lower cost and smaller size, rather than more power.

Consider cars. For the first half of the 20th century, the big thing was making engines more powerful. By the 1960s, engine power was a solved problem, (the 1967 turbine-powered Indy car finally settled that issue) and cars really haven't become significantly more powerful since then. (Brakes and suspensions, though, are far better.)

It will be very interesting to see what happens with the Cell. That's the first non-shared memory multiprocessor to be produced in volume. If it turns out to be a dead end, like the Itanium, it may kill off interest in that sort of thing for years.

There are some interesting potential applications for massive parallelism for vision and robotics applications. I expect to see interesting work in that direction. The more successful vision algorithms do much computation, most of which is discarded. That's a proper application for many-CPU machines, though not the Cell, unless it gets more memory per CPU. Tomorrow's robots may have a thousand CPUs. Tomorrow's laptops, probably not.

Re:It's a supercomputer perspective (2, Interesting)

deadline (14171) | more than 7 years ago | (#17967902)

Basic truth about supercomputers - the commercial market is zilch. You have to go down to #60 on the list of the top 500 supercomputer before you find the first real commercial customer.

You may want to adjust your truth as your measure of the market is wrong. The Top500 is not a marketing survey and just because you have HPC hardware does mean you run out and try an get it on the Top500. Many companies are using (HPC) parallel cluster computers, but they choose to be quiet about it for competitive reasons. The 2005 HPC market was well over 9 Billion and IDC predicts a 9% AGR bringing the market to over $14 Billion in 2010. Can you tell me what other markets are offering such growth rates these days?

Supercomputer guys fuss endlessly over elaborate interconnection schemes, but none of them are worth the trouble.

If as most companies have stated that they cannot compete without using HPC [compete.org] , R&D at the high end is indeed, worthwhile. The endless fussing eventually makes it to commodity markets, open you computer case and look around.

Re:It's a supercomputer perspective (2, Insightful)

antifoidulus (807088) | more than 7 years ago | (#17968284)

Also keep in mind that many companies aren't interested in linpac peformance per se, at least to the extent that they will spend a lot of time and effort tweaking their computers to get really high linpac scores, which is all that is important when it comes to top500.

Re:It's a supercomputer perspective (5, Informative)

RecessionCone (1062552) | more than 7 years ago | (#17967986)

I don't think you were listening very carefully to the talk (or know much about Computer Architecture) if you think Dave Patterson is a supercomputer guy. Perhaps you've heard of the Hennessy & Patterson Quantitative Approach to Computer Architecture book (you know, the one used at basically every university to teach about computer architecture). Patterson has been involved in a lot of different things within computer architecture over the years, including being one of the main people behind RISC and RAID (as well as being the president of the ACM). I saw his talk when it was given at Berkeley, and you really missed the point if you thought it was about supercomputing. The talk was about the future of computing in general, which is increasingly parallel, in case you're unaware of that fact. GPUs are already at 128 cores, Network processors are up to 200 cores. Intel is going to present an 80 core x86 test chip tomorrow at ISSCC. Physics won't support faster single core processors at the rate we're accustomed to, so the whole industry is going parallel, which is a sea change in the industry. Patterson's talk is aimed at the research community, since we don't have good answers as to how these very parallel systems should be architected and programmed. FPGA emulation is a great way to play around with massive multiprocessor configurations and programming strategies, which is why Patterson is advocating it (his RAMP project has researchers from MIT, Berkeley, Stanford, Texas, Washington involved (among others)). You also need to have a little more imagination about what we could do with more computing power. Try looking at Intel's presentations on RMS http://www.intel.com/technology/itj/2005/volume09i ssue02/foreword.htm [intel.com] .

Re:It's a supercomputer perspective (1)

Wesley Felter (138342) | more than 7 years ago | (#17968012)

The author of this paper thinks that all the programming headaches of supercomputers will have to be brought down to desktop level, but that's probably not going to happen. What problem would it solve?

It would solve the problem of software actually being able to run faster on computers that will be produced in the near future. If the status quo continues, desktop software simply won't get any faster, even when computers get faster; this seems like a real waste.

Programming 10,000 Cores (2, Interesting)

deadline (14171) | more than 7 years ago | (#17967630)

Those of us that use HPC clusters (i.e. Beowulf) have been thinking about these issues as well. For those interested, I wrote a series of articles on how one might program 10,000 cores (based on my frustrations as programmer and user of parallel computers). Things will change, there is no doubt.

The first in the series is called Cluster Programming: You Can't Always Get What You Want [clustermonkey.net] The next two are Cluster Programming: The Ignorance is Bliss Approach [clustermonkey.net] , and Cluster Programming: Explicit Implications of Cluster Computing [clustermonkey.net] .

Comments welcome.

Woot 74 (-1, Troll)

Anonymous Coward | more than 7 years ago | (#17967662)

already aware, *BSD lubrication. You Th1s is consistent fate. Let's not be Kreskin Shouts To the name on the jar of happiness Another

Check out co-array fortran.... (2, Informative)

epiphyte(3) (771115) | more than 7 years ago | (#17967668)

AKA F--, The simplest explicit programming model on the planet. Brainchild of Bob Numrich, unsung hero of Cray Research in the early 90's ( & probably much before... but that was when I was lucky enough to work with him) F-- was Numrich's second great contribution to parallel programming models... the first being the shmem model for the Cray T3D, Four assembly routines which made the raw capabilities of the T3D available to massively parallel applications when every other programming model (e.g. MPI) had about 50x the communication overhead. This was a big factor in Cray's takeover of the short-lived MPP market in the mid 90's. On the topic of the thread.... Explicit programming models scale to thousands of processors, implicit ones peter out at 4-8. The reason is Data Locality. Explicit models ensure that the processor is operating on data which is local and unshared. Implicit models end up fighting for operands with competing processors. This requires either heroic hardware ( e.g. 70% of the Cray C-90s processor logic was concerned with memory contention resolution) or a dramatic performance dropoff.

Que : Inherently Parallel Programming (2, Interesting)

FMota91 (1050752) | more than 7 years ago | (#17967734)

Actually, I've been working on a programming language/model that makes programs inherently parallel. Of course, it is quite different from anything currently in existence. Basically, it uses a queue (hence the name "Que") to store data (like the stack in FORTH), but due to the nature of the queue, programs become inherently parallel. Large programs could have hundreds of processes running at the same time, if so inclined.

If you are interested, check out my project [sourcefourge.net] (there's not much there right now), and/or contact me at FMota91 at GMail dot com.

Re:Que : Inherently Parallel Programming (0)

Anonymous Coward | more than 7 years ago | (#17967888)

Using queues in parallel languages is not new [mit.edu] . Your project also contains, like, 5 files and was started on January 30. Apparently you're not very far along yet.

Re:Que : Inherently Parallel Programming (1)

FMota91 (1050752) | more than 7 years ago | (#17967996)

Thank you for answering me. I was not aware of that project, but I will look into it to see what I can gleam from it. And yes, I have only just started.

Re:Que : Inherently Parallel Programming (0)

Anonymous Coward | more than 7 years ago | (#17968252)

Since you've clearly not done any research into programming languages, I suggest you hold back on calling it "very different from other programming languages".

Re:Que : Inherently Parallel Programming (1)

FMota91 (1050752) | more than 7 years ago | (#17968480)

It is definitely very different from cilk. And I have looked into a fair amount of different programming languages.

cilk looks a lot like C -- in fact, it is basically an extionsion of C.
Que looks a bit like FORTH. It is not an extension, nor is it programmed anything like FORTH.

cilk has explicit parallelism.
Que has implicit parallelism.

Here's a snippet of Que code (from inside a "word" that duplicates three items on the queue):

( a b c -- a b c a b c )
dup dup dup
pass swap swap pass
pass pass swap pass pass

PS: I believe I have studied [much] more than a dozen programming languages. That's more than most people I know (although admittedly I do not know that many people in CS). Most languages are similar to other languages, with a few exceptions (FORTH, Lisp, almost every esoteric programming language). Que is different from all of those programming languages.

Re:Que : Inherently Parallel Programming (0)

Anonymous Coward | more than 7 years ago | (#17969122)

You should read the paper before you chalk Cilk up to a C extension. Specifically, look at the guarantees that cilk makes about load balance and work allocation, and the work-stealing that gets done between threads to accomplish that. Cilk's parallelism is explicit in that concurrent tasks must be spawned, but it's implicit in that threads are scheduled intelligently, even on potentially heterogeneous machines.

Re:Que : Inherently Parallel Programming (1)

xenocide2 (231786) | more than 7 years ago | (#17968764)

Indeed, I work with TinyOS [tinyos.net] , an open source platform for embedded sensor network systems. One of the features is a task queue. Unfortunately, it doesn't support associating data with a given task in the queue, which would make life much easier for me, and much harder for the people authoring the system. The whole thing's written in "nesC" which is basically C with a few extras and some component oriented things to modularize C.

As a language, its design is to take interrupts and parallelism and make it part of the language. You're allowed to label sections atomic, and the compiler enforces this. Currently it just turns off interrupts, but analysis could improve this by removing useless atomics and perhaps identifying when one atomic section won't interfere with another. But for now, turning off interrupts is cheap enough that it's hard to justify stronger synchronization primitives at runtime. Currently the system assumes a single task from the queue will be running, so one can guarantee atomicity between them, but I don't know if the langauge / system explicitly states this.

Unfortunately, this article claims to have inspected the subject of embedded systems and parallelism, but aside from some head-nodding to VxWorks, there's not much meat on for the subject. Which is probably ok. The paper focuses on compute bound domains like BLASTp, the stuff we work with is limited more by battery longevity and the speed at which connected devices work. If a 12bit ADC is going to take some time, we'd rather go into a low power state if possible. Same goes for writing to SD over i2c. Yet some of the conventional wisdoms the authors put forth are true. Transistors (atmega128s) are cheap, but battery power is not. The cost of going out to the field and replacing a battery could approach the cost of replacing the device. Many deployments just ignore battery failures, since once you have a hundred of these things deployed you're not gonna find the failed one anyways. But there's still a few that are just junk for us, such as floating point vs memory and the potential for faster CPUs. But as I noted above, our software isn't trying to solve some compute intensive problem, it's controlling hardware.

Berkeley View Links (2, Informative)

RecessionCone (1062552) | more than 7 years ago | (#17967778)

For those that are interested, the Berkeley View project website is at http://view.eecs.berkeley.edu/ [berkeley.edu] , which includes some video interviews with the principal professors involved in the project. There is also a blog at http://view.eecs.berkeley.edu/blog/ [berkeley.edu]

Shared-state concurrency is harder than you think. (5, Insightful)

zestyping (928433) | more than 7 years ago | (#17967950)

Reliably achieving even simple goals using concurrent threads that share state is extremely difficult. For example, try this task:

Implement the Observer [wikipedia.org] (aka Listener) pattern (specifically the thing called "Subject" on the Wikipedia page). Your object should provide two methods, publish and subscribe. Clients can call subscribe to indicate their interest in being notified. When a client calls publish with a value, your object should pass on that value by calling the notify method on everyone who has previously subscribed for updates.

Sounds simple, right? But wait:
  • What if one of your subscribers throws an exception? That should not prevent other subscribers from being notified.
  • What if notifying a subscriber triggers another value to be published? All the subscribers must be kept up to date on the latest published value.
  • What if notifying a subscriber triggers another subscription? Whether or not the newly added subscriber receives this in-progress notification is up to you, but it must be well defined and predictable.
  • Oh, and by the way, don't deadlock.
Can you achieve all these things in a multithreaded programming model (e.g. Java)? Try it. Don't feel bad if you can't; it's fiendishly complicated to get right, and i doubt i could do it.

Or, download this paper [erights.org] and start reading from section 3, "The Sequential StatusHolder."

Once you see how hard it is to do something this simple, now think about the complexity of what people regularly try to achieve in multithreaded systems, and that pretty much explains why computer programs freeze up so often.

Re:Shared-state concurrency is harder than you thi (1)

xenocide2 (231786) | more than 7 years ago | (#17969074)

Why does the notify allow() for throwing exceptions? The only exception I can see that it throws is IllegalMonitorState, and that seems easy enough to fix. I vaguely recall something about runtime exceptions or whatnot always being throwable, but really, there's nothing the Observer can do about such things except carry on with the rest of notifications. This is bad for debugging, so one solution would be to hold the exceptions until all notifications have been done and then deal with them all at once. Simply ignoring exceptions isn't an option, multi-threaded or not. If anything, this is an example of how instructional curricula have failed Java programmers.

If publishing triggers a subscriber to trigger another publish, that's either programming error or intended behavior in my book. A strict reading of the spec you've provided suggests we must notify all subscribers all publishes, not merely the newest. This eliminates a possible optimization should this trigger not occur every time.

My approach, without reading the paper, would be to make a copy the subscriber list on a publish. Synchronizing add and copy should be as simple as synchronizing on the subscriber's data structure. If this is prone to deadlock, I'm afraid you shouldn't be using Java.

Re:Shared-state concurrency is harder than you thi (1)

zestyping (928433) | more than 7 years ago | (#17969572)

If publishing triggers a subscriber to trigger another publish, that's either programming error or intended behavior in my book. A strict reading of the spec you've provided suggests we must notify all subscribers all publishes, not merely the newest.
No problem with that. For correctness we just need to make sure that the last update received by each subscriber contains the latest published value.

My approach, without reading the paper, would be to make a copy the subscriber list on a publish.
Indeed, this is exactly the solution considered in the discussion (section 4.2 of the paper). But if you have a notification loop like this:

        for (Subscriber subscriber: subscribers) {
                subscriber.notify(value);
        }

and it isn't synchronized, the notifications can arrive out of order.

I'm not saying you can't solve this problem. The point is, it's much trickier than almost all programmers are taught to realize. (This problem is easy to solve if you use event queues instead of multiple threads.)

Systems won't be so homogenius (1)

BlueCoder (223005) | more than 7 years ago | (#17967988)

As parallelism ramps up the processor cores will get simpler and we will return to the concept of RISC computing. But the programing paradigm will be virtual processors. Furthermore some processors will be better at certain operation than others.

What will eventually turn into an AI subsystem, will postcompile and dynamically rewrite programs to take best advantage of the system. By AI I don't mean anything sentient, just something that intelligently recognizes programming structures and patterns and restructures the code to match the hardware. Think IA64 on steroids.

Re:Systems won't be so homogenius (0)

Anonymous Coward | more than 7 years ago | (#17968958)

I'm trying to understand what you mean but it's sounding like a string of technical sounding words put together to seem intelligent. The RISC computing is simply reducing the instruction set and addressing modes available to the programmer. I don't see how concurrency is a CISC/RISC issue - it's a software design issue.

Fuc8k? (-1, Offtopic)

Anonymous Coward | more than 7 years ago | (#17968658)

sanc2tion5s, and

There is no one right answer (3, Insightful)

kwahoo (899178) | more than 7 years ago | (#17968902)

...if there were, the langauge wars of the 80s and 90s would have produced an answer. And what new langauge caught on? Not Sisal, or C*, or Multilisp, etc. It was Java. And C, C++, and Fortran are still going strong.

Part of the problem, as previous posts have observed, is that most people didn't have much incentive to change, since parallel systems were expensive, and bloated, ineffeicient code would inevitably get faster thanks to the rapid improvement in single-thread performance that we enjoyed until recently. So outside of HPC and cluster apps, most parallelism consisted of decoupling obviously aynchronous tasks.

I don't think there ever will be one language to rule them all.... The right programming model is too dependent on the application, and unless you are designing a domain-specific system, you will never get people to agree. Depending on your needs, you want different language features and you make different tradeoffs on performance vs. programmability. For some applications, functional programming languages will be perfect, for others Co-Array Fortran will be, for others an OO derivative like Mentat will be, etc. And as new applications come to the fore, new languages will continue to spawn.

I think the key is to:

  • Do your best to estimate what range of applications your chip will need to support (so you could imagine that chips for desktops/workstations might diverge from those for e-commerce datacenters--which we already see to a mild extent)
  • Deduce what range of programming language features you need to support
  • Do your best to design an architecture that is flexible enough to support that, and hopefully not close off future ideas that would make programming easier.

If one programming model does triumph, I would predict that it will be APIs that can be used equally well from C, Fortran, Java, etc., thus allowing different application domains to use their preferred APIs. And even that model is probably not compelling enough to bring them all and in the dark bind them....

what about I/O and mosix style clustering? (3, Interesting)

soldack (48581) | more than 7 years ago | (#17969224)

I used to work for SilverStorm (recently purchased by QLogic). They make InfiniBand switches and software for use in high performance computing and enterprise database clustering. The quality of the I/O subsystem of a cluster played a large part in determining the performance of a cluster. Latency (down the microsecond) and bandwidth (over 10 gigabits per second) both mattered.

Also, we found that sometimes, what made a deal go through was how well your proposed system could run some prexisting software. For example, vendors would publish how well they could run a standard crash test simulation.

Also, I would like to see more research put into making clustered operating systems like mosix good enough so that developers can stick to what they have learned on traditional SMP systems and have their code just work on large clusters. I don't think that multicore processors eliminate the need for better cluster software.

Summary is misleading (1)

mrshoe (697123) | more than 7 years ago | (#17969574)

the 'evolutionary approach to parallel hardware and software may work from 2- or 8-processor systems, but is likely to face diminishing returns as 16 and 32 processor systems are realized'

I saw Patterson give this talk at PARC. While he did say something similar to the above quote, he was NOT suggesting that fewer, more powerful processors is the future (which is the impression I got from the summary). In fact, the entire talk was to the opposite point. Parallelism is the future; the processor is the new transistor; processors will have thousands of cores, etc. Very fascinating stuff. I hope more researchers participate and use his emulation system, because I think he's correct (as he has been in the past) that we will need to adopt parallelism more in the future.

Re:Summary is misleading (1)

mshurpik (198339) | more than 7 years ago | (#17969878)

Well yeah, but can you parallel to 128 processors as well as you can parallel to 4 processors? I doubt it. Parallelism is an art, and if the future of computing is based on art, then we've left Moore's Law at the station.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?
or Connect with...

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>