×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

SW Weenies: Ready for CMT?

Hemos posted more than 8 years ago | from the step-on-up dept.

Software 378

tbray writes "The hardware guys are getting ready to toss this big hairy package over the wall: CMT (Chip Multi Threading) and TLP (Thread Level Parallelism). Think about a chip that isn't that fast but runs 32 threads in hardware. This year, more threads next year. How do you make your code run fast? Anyhow, I was just at a high-level Sun meeting about this stuff, and we don't know the answers, but I pulled together some of the questions."

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

378 comments

umm (-1, Offtopic)

Anonymous Coward | more than 8 years ago | (#12801878)

SW Weenies? Huh?

Windows Articles, Slashdot and Pragmatism (1, Offtopic)

repruhsent (672799) | more than 8 years ago | (#12801883)

Why do the Slashdot editors even bother posting articles about windows?

I was looking at the front page just now and saw an article entitled "Security Patch Creation at Microsoft." [slashdot.org] I'm sure everyone knows that an article like this would obviously be nothing but Taco's blatant attempt at starting a flame war, which usually just ends up in the Linux Zealots arguing amongst themselves about how fucked up Microsoft's patching techniques are.

Comments in articles like these fall into a few categories; the type that constructively criticizes Microsoft's practices (few and far between); the type that says this type of security procedure would never happen with open source (more frequent); and the most prevalent type, "LOL linux rulez!!!!!!!!!111111111onehundredeleven" (about 99.999% of the comments).

I've build countless machines over the years and have built and helped to administer quite a few networks. Linux is good for a server, true, but as far as client operating systems go, XP is pretty decent. Sure, the licensing procedures are draconian, I'll give the Linux Zealots that, but aside from that, the system is stable, secure (if a competent admin oversees it) and fulfills most technological needs. For a home file server or print server, nothing beats Linux (as of yet), only because of the fact that I can't think of too many home users with $500 USD to fork over for a license for Windows 2003 Server.

Some of us are software pragmatists. I use whatever is best to get the job done. I prefer Mac OS X over every other operating system I've used, but I know that there are some things it cannot do. If I want to play games, I fire up an XP machine. If I want to serve clients or write code for rather ugly windowing toolkits [gtk.org] , I use Linux. If I want a stable Unix machine that works without me taking a week off from school to configure it, I use Mac OS X.

I thought the reason we all got into computing is because we loved it, not to convert people. That is a job for fanatics. We should concentrate more on what's best for a job and not on what is fashioable to the FSF and our Linux Zealot buddies. I think it's time everyone here to grow the fuck up.

Re:Windows Articles, Slashdot and Pragmatism (0, Offtopic)

Rwilson500 (822764) | more than 8 years ago | (#12801940)

I agree with you on using the best tool for the job, but what does this have to do with the actual article?

Re:Windows Articles, Slashdot and Pragmatism (0)

Anonymous Coward | more than 8 years ago | (#12802095)

Grow up, it's you who constantly take cheap stabs at Linux.

Re:Windows Articles, Slashdot and Pragmatism (1)

gabebear (251933) | more than 8 years ago | (#12802109)

Wow, I've been noticing some out of place posts on Slashdot for a couple days now but this one just proves Slashdot has a serious problem.

I'm sure you didn't mean to but your post ended up showing up as the first post in an Article about CMT [slashdot.org] . What's really wierd is that it showed up after a bunch of other posts...

Ready for CMT? Hell no! (3, Funny)

iostream_dot_h (824999) | more than 8 years ago | (#12801905)

Now my hardware will force me to support CMT [cmt.com] on my computer? This is as bad as DRM.

Re:Ready for CMT? Hell no! (2, Funny)

ksheff (2406) | more than 8 years ago | (#12802005)

CMT is manufactured pop-country music at its worst. Yuck!

Re:Ready for CMT? Hell no! (3, Funny)

NoData (9132) | more than 8 years ago | (#12802219)

Seriously! And why foist this garbage on the Star Wars (SW) weenies? Has John Williams gone country?

Schism Growing (2, Insightful)

SirCyn (694031) | more than 8 years ago | (#12801909)

I see a deep schism growing in the processor industry. There are two main camps, the parallel processors, and the screemin single processors.

The parallel are used for intense processing. Research, servers, clusters, databases; anything that can be divided into many little jobs and run in parallel.

The other camp is the average user who just wants fast respons time and to play Doom 3 at 100+ fps.

Re:Schism Growing (2, Insightful)

GoatMonkey2112 (875417) | more than 8 years ago | (#12801952)

This will go away once there are games that take advantage of multiple processors. Eventually the game user will start to see the advantage of multiple processors. It's already starting to become clear when you look at the architectures of the next generation consoles.

how much for the best of both worlds? (1)

nounderscores (246517) | more than 8 years ago | (#12802032)

If price was no object, someone could design a chip with more than two cores in it, and each core still ran as fast as any single core chip out there.

Just the existance of one such device would heal the rift immediately. Everyone would say... aha! It is only a matter of time before blazing speeds and hardware threading comes to the desktop.

Re:Schism Growing (4, Interesting)

timford (828049) | more than 8 years ago | (#12802138)

You're right that the latest generation console CPU architectures reflect the trend of concurrent thread execution. That said, however, there seems to be a parallel trend developing that involves separating the general purpose CPU into independent single-purpose processors.

The most obvious example of this is the GPU, which has been around for a long time. The latest moves toward this trend rumored to be in development are PPUs, Physics Processing Units. How long until game AI evolves enough that we have the need for AIPUs also?

This approach obviously doesn't make too much sense in a general purpose computer because the space of possible applications and types of code to be run are just too large. It makes perfect sense in computers that are built especially to run games though, because we have a very good idea of the different kinds of code most games will have to run. This approach allows each type of code to be run on a processor that is most efficient at that type of code, e.g. graphics code being run on processors that provide a ton of parallel pipelines.

Re:Schism Growing (1)

udderly (890305) | more than 8 years ago | (#12801965)

I see a deep schism growing in the processor industry. There are two main camps, the parallel processors, and the screemin single processors.

I would like to have a parallel processor for my servers and a single processor to do video rendering. Is there a downside that I'm missing?

Re:Schism Growing (0)

Anonymous Coward | more than 8 years ago | (#12801994)

The other camp is the average user who just wants fast respons time and to play Doom 3 at 100+ fps.

I think some of the people in this "camp" that put together gaming PC's to squeeze every frame they can get would not appreciate being referred to as "average".

Re:Schism Growing (5, Interesting)

philipgar (595691) | more than 8 years ago | (#12802060)

Actually from what I've heard, the entire industry is moving in this direction. The whole idea of out of order processors (OOP) has become outdated. OOP was great. Enabled massive single threaded performance, however the costs (in terms of area and heat dissipation) is enormous.

I just came back from the DaMoN [cmu.edu] workshop where the keynote was delivered by one of the lead P4 developers. He explained the future of microprocessors and said that the 10-15% extra performance that OOP enables just isn't worth it. The Pentium 4 has 3 issue units, but the way things are rarely issues more than 1 instruction per cycle.

We can squeeze more performance out of them, but not much. The easiest method is to go dual core. However if an application must be multithreaded to enable the best performance, what would you rather have . . . 2 highly advanced cores, or 8-10 simple cores that can issue half as many instructions per cycle as the dual core design. Than consider the fact that each core enables 4 threads to run (switch on cache miss/access). It doesn't take a rocket scientist to see that overall performance is improved with this.

The other option is the hybrid core. A single really fast x86 core combined with multiple simpler x86 cores. That way single threaded apps can run fast (until they're converted) and you can get overall throughput from the system without blowing away your power budget on OOP optimizations.

Granted most of this is in the future (within the next 5 years), but IBM's going that way (ala Cell), its within Intels roadmap, Sun is pushing that route etc. I assume AMD has plans to create a supercomputer on a chip . . . unless they wish to be obsoleted.

Phil

Don't worry (2, Informative)

StupidKatz (467476) | more than 8 years ago | (#12802062)

You can have your parallel processors and still play DOOM III at insane fps. At worst, it will just take a bit for folks to start writing programs to take advantage of the additional processors/cores.

BTW, your "average" user hasn't even played DOOM I, let alone DOOM III. Surfing the web and using e-mail doesn't usually put a lot of strain on a PC.

Re:Schism Growing (1)

selderrr (523988) | more than 8 years ago | (#12802065)

l33ts who want Doom3 at 100+fps can also benefit from massive paralellism : the graphics are offloaded to the GPU anyway, so what's left for the CPU is projectile & object positioning, and AI.

imagine a future PC with 32656 CPUs, all running at a measly 40MHz, but each one dedicated to a single object in the game. All they have to do is calc the position of that single object. Might give some interesting results

Re:Schism Growing (1)

timford (828049) | more than 8 years ago | (#12802177)

Just as a slight correction, the majority of CPU work done in games like Doom3 is also graphics-related, despite the existence of the GPU. The CPU has to take care of setting up all the data to be fed to the GPU... for example calculating shadow volumes, applying bone transformations to skin vertices, etc.

imagine a future PC with 32656 CPUs, all running at a measly 40MHz, but each one dedicated to a single object in the game. All they have to do is calc the position of that single object. Might give some interesting results

This would be horrible IMHO. The vast amount of information that would need to be passed among all the processors would dwarf the actual game code.

Re:Schism Growing (1)

rpresser (610529) | more than 8 years ago | (#12802353)

imagine a future PC with 32656 CPUs,

What are the other 111 processors doing (32656 + 111 = 2^15-1)? Enforcing DRM?

--
Why the heck doesn't slashcode let me use <sup> and <sub>?

Niagara Myths (4, Insightful)

turgid (580780) | more than 8 years ago | (#12801920)

I am totally not privy to clock-rate numbers, but I see that Paul Murphy is claiming over on ZDNet that it runs at 1.4GHz.
Whatever the clock rate, multiply it by eight and it's pretty obvious that this puppy is going to be able to pump through a whole lot of instructions in aggregate.

Ho hum.

On a good day, with a following wind, Niagara might be able to do 8 integer instructions per second, provided it has 8 independent threads not blocking on I/O to execute.

It only has one floating-point execution unit attached to one of those 8 cores, so if you have a thread that needs to do some FP, it has to make its way over to that core and then has to be scheduled to be executed, and then it can only do one floating-point instruction.

Superb.

The thing is, all of the other CPU vendors with have super-scalar, out-of-order 2- and 4- core 64- bit processors running at over twice to three times the clock frequency.

You do the mathematics.

Argh! (2, Informative)

turgid (580780) | more than 8 years ago | (#12801942)

Today I have diarhea in the guts as well as the mind. I should have previewed that before I posted it.

On a good day, with a following wind, Niagara might be able to do 8 integer instructions per second, I meant per clock cycle, of course, not per second.

The thing is, all of the other CPU vendors with have

I meant "will have" not "with have".

/me LARTS himself with a big stick.

Re:Argh! (1)

deetsay (703600) | more than 8 years ago | (#12802151)

On a good day, with a following wind, Niagara might be able to do 8 integer instructions per second
Combine that beast of a processor with Longhorn, and pushing out one KILOBYTE will be just matter of seconds...

Shame (3, Interesting)

gr8_phk (621180) | more than 8 years ago | (#12802146)

That's really a shame about the FP performance. My hobby project is ray tracing, and my code is just waiting to be run on parallel hardware. The prefered system would have multiple cores sharing cache, but seperate cache would be fine too. memory is not the bottleneck, so higher GHz and more cores/threads will be very welcome so long as they each have good performance. The code scales well with multiple CPUs as pixels can be rendered in parallel with zero effort - the code was designed for that. As it sits, I'm hoping my Shuttle (SN95G5v2) will support a AMD64x2 shortly. We're still not up for RT Quake, but interactive (read very jerky 1-2 fps) high-poly scenes are possible today.

Re:Niagara Myths (3, Insightful)

Shalda (560388) | more than 8 years ago | (#12802185)

Well, as you might expect, Sun has only a server mentality. The typical server runs few floating point instructions. In a lot of ways, Niagara would be very good at crunching through a database or serving up web pages. On the other hand, such a processor would be worthless on a desktop or a research cluster. I'd like to see actual real-world performance on these processors. I'd also like to see what Oracle charges them for a license. :)

Re:Niagara Myths (3, Funny)

rwyoder (759998) | more than 8 years ago | (#12802336)

On a good day, with a following wind, Niagara might be able to do 8 integer instructions per second...
Uh, I believe they said it was 1.4GHz, not 1Hz.

Steam Engine - Diesel (5, Insightful)

kpp_kpp (695411) | more than 8 years ago | (#12801921)

Some people have predicted this move for quite some time. I remember hearing about it back in the late 80's early 90's and I'm sure it goes way back before then. The analogy was to Steam Engines and why they lost out over Diesels. You can only make a Steam engine so big but you cannot connect them together to get more power. With diesels you can hook many of them together for more power. Chips are finally getting to the same point -- It is more cost efficient to chain them together than to create a monsterous one. I'm surprised it has take this long to get to this point.

Re:Steam Engine - Diesel (4, Insightful)

turgid (580780) | more than 8 years ago | (#12802040)

The problem has been the cost of software development. It's almost always cheaper to throw more hardware at a problem than invest in cleverer code. Highly parallel designs require very clever code. The Pentoum 4 debacle has finally shown that we're now at the stage where we're going to have to bite the bullet at develop that cleverer code. With ubiquitous high-level laguages running on virtual machines (e.g. Java) this is becoming more feasable since a lot of the gory details and dangers can be hidden from the average programmer.

Re:Steam Engine - Diesel (3, Informative)

arkanes (521690) | more than 8 years ago | (#12802423)

You cannot hide the gory details and also thread for (pure) performance, at least not to any signifigant degree, and not with our current ability to analyze programs. Some current compilers/languages can squeeze out some parallelism via analysis, but to prevent bugs they must be conservative, so you rarely get signifigant performance boosts. The key to parallelizing performance is minimizing information sharing, and thats a design/archiectural issue that can't really be addressed automatically. It's not simply a matter of higher level languages or cleverer code - the inherent complexities and dangers of multi-threaded programming are quite large, to the point where it's almost impossible to prove the correctness of any signifigantly multithreaded application while still gaining a performance boost.

Note that I am talking about pure performance gain here, not percieved performance, such as keeping a GUI responsive during long actions - that kind of MT is generally slower than the single threaded alternative, and is fairly easy to keep correct.

Gaining performance via multithreading requires you to seperate out multiple calculations, with minimal dependencies between them. The number of applications that can benefit from this is much smaller than you might think. I doubt very much that we'll see very many applications get a boost from dual/many core processers, and it's not just a matter of "re-writing legacy apps". What we will see is over all system speed increases on multi-threaded OSes.

Re:Steam Engine - Diesel (2, Insightful)

spotvt01 (626574) | more than 8 years ago | (#12802145)

It's all about the scalability in processor architecture. And unfortunately, your analogy about diesel engines only goes so far. You can only chain so many pistons together before you have to worry about how effecient you can transfer the energy to the drive train. There is an upperbound of effectiveness. Concentrating on the number of pistons and ignoring each pistons' capabilites will leave you with a lot of hourse power but little torque. The same problem exists in multiple core designs, namely: only so many things can be done in parallel. This is because most programs are sequential in nature and benefit very little from executing their code in parallel. And eventually, you'l get down to something sequential like the bus or acess to memory or paging to the hard diskwhich is where the real bottle neck is anyway). About the only thing this will help with is if you're doing some sort of mathmatical computing (using MPI or somethigg like that as was previously mentioned) or you're playing Doom3 while you're your rendering the spcial effects for Star Wars III. In which case you need to get out more ;)

Re:Steam Engine - Diesel (0, Flamebait)

MajorDick (735308) | more than 8 years ago | (#12802197)

"You can only make a Steam engine so big but you cannot connect them together to get more power"

That has to be bar none one of the DUMBEST things I have ever heard on slashdot.

Mind you I understand it was not you that said it, no problem there, but whoever said that originally had best stay away from ANYTHING mechanical.

Re:Steam Engine - Diesel (1)

flaming-opus (8186) | more than 8 years ago | (#12802275)

This has been going on for years. IBM gave up on bigger single CPUs about 1980, so did cray, cyber, and unisys. Everyone has been doing multiprocessors for decades now. The only new thing is that they are sticking lots of them on a single piece of silicon, instead of one per chip. (or multiple chips per cpu, as the case may be).

EPIC? (1, Insightful)

Anonymous Coward | more than 8 years ago | (#12801924)

So does this mean that Intel's gamble with the Itanium was a good one? Or does this mean that we are going to try to teach students a totally new development style for more threads and parallelism?

WTF? (4, Funny)

Timesprout (579035) | more than 8 years ago | (#12801929)

and we don't know the answers, but I pulled together some of the questions."

What is this now, Questions for Nerds. Stuff we dont know?

well at least he seems to understand the problems (5, Interesting)

Anonymous Coward | more than 8 years ago | (#12801936)

from TFA:
"Problem: Legacy Apps You'd be surprised how many cycles the world's Sun boxes spend running decades-old FORTRAN, COBOL, C, and C++ code in monster legacy apps that work just fine and aren't getting thrown away any time soon. There aren't enough people and time in the world to re-write these suckers, plus it took person-centuries in the first place to make them correct.

Obviously it's not just Sun, I bet every kind of computer you can think of carries its share of this kind of good old code. I guarantee that whoever wrote that code wasn't thinking about threads or concurrency or lock-free algorithms or any of that stuff. So if we're going to get some real CMT juice out of these things, it's going to have to be done automatically down in the infrastructure. I'd think the legacy-language compiler teams have lots of opportunities for innovation in an area where you might not have expected it."

Re:well at least he seems to understand the proble (1)

jstott (212041) | more than 8 years ago | (#12802105)

"Problem: Legacy Apps You'd be surprised how many cycles the world's Sun boxes spend running decades-old FORTRAN, COBOL, C, and C++ code in monster legacy apps that work just fine and aren't getting thrown away any time soon. There aren't enough people and time in the world to re-write these suckers, plus it took person-centuries in the first place to make them correct.

Well, the Fortran programs have an easy solution---just recompile with a modern compiler designed for these CPU's. Any loop that can be automatically unrolled can be parallelized instead. Loop parallelization has been a standard Fortran optimization on parallel architectures for decades. Yes, this can be done with other languages as well, but historically it hasn't been (I expecte either due to a lack of demand, or because it's harder to accomodate language features [things like strict aliasing], or both).

-JS

Re:well at least he seems to understand the proble (1)

daVinci1980 (73174) | more than 8 years ago | (#12802342)

Any loop that can be automatically unrolled can be parallelized instead.
Please unroll the following loop automatically (not FORTRAN, but simple enough to translate):
void AccumulateLoopCount(int N) {
int accumulator = 0;
for (int i = 1; i < N; ++i) {
accumulator += i;
}
return accumulator;
}
Now make the code parallel.

(I realize that this solution could actually be computed at compile-time for any known value of N, and I realize that there is a formula to compute this answer in constant time). My point is that just because a loop can be unrolled automatically (this loop can) does not mean that it can be executed in parallel. Executing this code in parallel would result in a *massive* performance hit or a tremendous memory size explosion.

Re:well at least he seems to understand the proble (1)

strider44 (650833) | more than 8 years ago | (#12802123)

But those decade old apps can easily be done by one core in its spare time. I'm not sure why this is an issue.

Re:well at least he seems to understand the proble (1)

Sique (173459) | more than 8 years ago | (#12802443)

Because sometimes the sheer amount of data those applications have to calculate has increased. Or because a calculation that once was done once a week during the weekend on several machines with separate data groups in parallel is now done as an instant report at the fingertip of a clueless manager, who just want to be the 'numbers to be up-to-date' (of course THIS calculation can be parallelized, but not in an algorithmic way, but by separating independent data).

Re:well at least he seems to understand the proble (1)

archeopterix (594938) | more than 8 years ago | (#12802163)

from TFA:
I guarantee that whoever wrote that code wasn't thinking about threads or concurrency or lock-free algorithms or any of that stuff.
Well, perhaps it's a job for the compiler to make that code thread-aware, at least to some degree. Two consecutive function calls that you (the compiler) know to be independent? Execute them in parallel. A loop running over 10000 independent objects? Split it into k loops, 10000/k objects each.

Of course the compiler has severe limits as to what it can really guess (the "independent" part can be very hard in this aspect), but at least once you write it, you can run it on all your apps for free.

How is this different from having multiple cores? (1, Offtopic)

MichaelSmith (789609) | more than 8 years ago | (#12801939)

...and isn't this the challenge being addressed by DragonFly BSD? [dragonflybsd.org]

Software people use threads already, as long as the VM and OS are up to the task. I don't see why it should matter if some of the threads are implemented in hardware.

Re:How is this different from having multiple core (1)

root-kun (755141) | more than 8 years ago | (#12802012)

To the desktop user, this really means nothing special. But when we're talking about producing a 1024-node system or, even some highend 1U racks for SMB markets, the more parallelism on chip, the better.

Vader vs. Brooks? (3, Funny)

hraefn (627340) | more than 8 years ago | (#12801941)

I almost thought this was going to be about Star Wars nerds being forced to watch something on Country Music Television.

Re:Vader vs. Brooks? (2, Insightful)

Aspasia13 (700702) | more than 8 years ago | (#12802068)

I almost thought this was going to be about Star Wars nerds being forced to watch something on Country Music Television.

Look out! It's Garth Vader!

Re:Vader vs. Brooks? (1)

Durinthal (791855) | more than 8 years ago | (#12802086)

Vader vs. Brooks?

Well, both of them did have another name they were called at one point..

One Weenie's Perspective (5, Funny)

Anonymous Coward | more than 8 years ago | (#12801953)

Well I am a Star Wars weenie, and I am definitely NOT ready for Country Music Television.

Re:One Weenie's Perspective (0)

Anonymous Coward | more than 8 years ago | (#12801989)

Don't be hating country music.

Question: What needs multiple threads? (2, Interesting)

dostert (761476) | more than 8 years ago | (#12801987)

As a scientific programmer, all I know is that this will eventually be a huge benefit to all my MPI and OpenMP codes.

I really only know the "scientific" programming languages, but most all math specific routines are already written for parallel machines. I'm a bit curious, what else really needs multiple threads? Isn't the benefit of dual-core procs the ability to not have a slow-down when you run two or three apps at a time? Don't games like DOOM III and Half-Life II depend mostly on the GPU (which I'm guessing they can handle multiple core GPU's since the programming should be fairly similar to SLI?)? What is the benefit in games? Just faster level loading times?

I don't want to sound like I'm whining or anything here... I'm not saying that multiple cores suck. On the contrary they're fantastic for what I do, but I just was hoping you guys could help me understand how common apps and non-mathematical operations can use them.

Re:Question: What needs multiple threads? (5, Insightful)

Frit Mock (708952) | more than 8 years ago | (#12802141)


In games the AI of non-player-characters (-objects) can profit a lot from threading.

But for common apps ... I don't expect a big gain from multiple threads. I guess typical apps like browsers, word-processor and so one have a hard time utilizing more than 3-4 threads for the most common operations a user does.

Re:Question: What needs multiple threads? (2, Insightful)

TheKidWho (705796) | more than 8 years ago | (#12802144)

umm, better physics and AI for games is what I can think of off the thop of my head =)

Re:Question: What needs multiple threads? (2, Interesting)

James McP (3700) | more than 8 years ago | (#12802170)

The simplest example is OS runs on one, the game another. But it's really not that simple. Let's take a typical Windows box since it's the bulk of the market.

Thread 1: OS kernel
Thread 2: firewall
Thread 3: GUI
Thread 4: print server
Thread 5-7: various services (update, power, etc)
Thread 8: antivirus
Thread 9: antivirus manager/keep-alive
Thread 10-16: spyware (I said a typical Windows box)
Thread 17+: applications

Yeah, CMT will be handy out of box as long as the OS is aware. I expect it will be wasteful the first couple of iterations but I can't count the number of times I've had to disable antivirus and yank the ethernet while running computationally intense applications.

Re:Question: What needs multiple threads? (1)

CrayzyJ (222675) | more than 8 years ago | (#12802424)

While on the surface, your idea really does not work. The cache will thrash like mad, the IO bus will be clogged, and paging (may) be a bottleneck. What if all 17 threads make a system call at the exact same time. The locking will bring the system to a screaching halt.

What you propose (not a horrible idea, btw) requires much more than just some threads in the CPU.

Re:Question: What needs multiple threads? (1)

rayzat (733303) | more than 8 years ago | (#12802265)

This really isn't a should we do single core or mutli-core design issue at this point. Because of new issues arrising form shrinking dimensions and diminishing returns on new hardware technology the only real option for future designs is multi-core. If you think about it any single core system is going to have a max speed once that max speed is reached the only way to make it faster is to add more cores. The only way single core system will be able to remain supreme is with new architectures and the only way to really utilize them is to re-write your code. So if you have to re-write you code anyway you might as well re-write it for the system with the most possibility. As for how common apps can use them some apps probably won't be able to exploit them and will run slower then then they would if they had 100% resource alloaction on a single core system. However multiple applications code could be run concurrently on the different cores so even though the applications will be running slower you will be getting more done.

Can use, not needs! (2, Interesting)

try_anything (880404) | more than 8 years ago | (#12802379)

If single-threaded performance improvements slow down, and the available computing power is spread out among multiple cores, anyone persisting in writing single-threaded code will fall behind in performance.

Remember the old days when people used fancy tricks to implement naturally concurrent solutions as single-threaded programs? The future is going to be just the opposite. Any day now we'll see a rush toward langages with special support for quick, clear, safe parallelism, just like we've seen scripting languages catch on for web programming.

Re:Question: What needs multiple threads? (0)

Anonymous Coward | more than 8 years ago | (#12802405)

but most all math specific routines

It looks like you're not also a scientific paper writer. It should be "but almost all math-specific routines". Having said that, the "most all" error is far too common. Think about it. Most. All. Most all. It almost sounds contradictory. Really, though, it's just bad style. See Strunk's commonly misused words and phrases [bartleby.com] .

Re:Question: What needs multiple threads? (2, Informative)

timford (828049) | more than 8 years ago | (#12802418)

You high-and-mighty scientific code snobs looking down on us game programmers! =)

Actually there is a whole lot to games like DoomIII and HL2 than what can be run on the GPU. First of all, a lot of the graphics-related code is never run on the GPU, it's run on the CPU (for example shadow-processing code), which then passes on the info to the GPU to do the actual rendering.

Secondly multiple core GPUs doesn't make that much sense to me. The nature of graphics processing is completely SIMD (like much of your scientific code probably). Graphics needs parallelism, but it doesn't need different code being run in parallel. It needs so much parallelism because there are millions of vertices and pixel fragments, each of which needs to be handled very much the same way (that is, with the same shader code). The main reason SLI exists is that there is a limit to how much parallelism we can put on one chip because of transistor limits and all that mumbo jumbo. When there comes a point that we could put multiple cores on one GPU... then we might as well have one core with twice the number of pipelines.

Finally, games like D3 and HL2 do a lot more than just graphics and level-loading. Physics are getting more and more realistic and therefore computationally intensive (HL2 has particularly good physics). Also I think we're on the brink of game AI becoming much more advanced than the simple state machines present in current games. Then there are more eccentric tasks like UnrealEngine3's "Seamless World Support" which constantly shuffles in and out resources so you can create huge worlds without loading times.

Re:Question: What needs multiple threads? (1)

paulpach (798828) | more than 8 years ago | (#12802427)

Most server software can use multiple threads or multiple processes.

For example apache:

When two people make a request to apache, you could serve one at a time. In that case, the second person will wait a relativelly long time. Especially if the first request happens to be a slow one.

To solve that problem, apache spawns multiple threads or processes (depending on configuration) and serves both requests at the same time.

Normally the OS alternates CPU between the two tasks. At any given time only one request is being processed by the CPU, but over time both requests appear to be executing at the same time. There is significant overhead jumping between the requests, and if there are cache misses, the CPU just stalls for a little bit.

Better scenario: Hardware can do multithreading. In this case, there is only one CPU, but the hardware alternates processing between the two requests (as opposed to the OS). This way, the OS does not incurr in the switching tasks overhead, and if there is a cache miss, the harware automatically switches to the other task withought wasting time which hopefully wont have a cache miss. This is what Sun is doing here, and what Intel does with hyperthreading.

Best case scenario: you have multiple cores, both requests can be processed by different cores trully at the same time. This is what AMD and Intel are doing with dual and quad cores.

Note multiple core and hardware threading don't have to be mutually exclusive. You can have multiple cores and each core support multiple threads. In fact, this is what you get when you have a dual P4 computer.

So to answer your question: Almost any server software such as apache, samba, postgresql, mysql, bind, and many others will greatly benefit from hardware threads or multiple cores. So long the server executes requests in multithreaded or multiprocess fashion.

re: Ready for CMT? (-1, Redundant)

Anonymous Coward | more than 8 years ago | (#12801996)

Yikes,
I thought he meant Country Music Television.
Nothing could prepare me for that...

Good article! (0)

Anonymous Coward | more than 8 years ago | (#12802009)

I suspect we're all going to have to look to languages that really do support very high levels of parallelism from the get go. We're going to need a high perfomance language and a scripting language. From my early days as a computer scientist, I'd say anything functional will serve us really well, especially languages like CAML and Scheme.

Big Hairy Package (5, Funny)

Tweak232 (880912) | more than 8 years ago | (#12802018)

"The hardware guys are getting ready to toss this big hairy package over the wall:"

Vivid imagary...

Screw CMT; Time to use wasted CPU (1)

WindBourne (631190) | more than 8 years ago | (#12802025)

Look, if you have 32 threads operating at 1/32 of GHz, or you have 1 thread operating at 2GHz, then it is a basic wash (not really, but close enough).

I would be far more interested in taking advantage of all the CPU cycles that run all over at Businesses. THink of how much wasted cycles there are running Screen Saver, or a Word document. By distributing the load amongst the systems, then a large number of things can be done.

Re:Screw CMT; Time to use wasted CPU (1)

Tweak232 (880912) | more than 8 years ago | (#12802120)

THink of how much wasted cycles there are running Screen Saver, or a Word document.

I that cycles spent running a word doccument are not wasted, as they are used for productivity, whereas a screen saver is not. It is not fair to compare the two.

And of course this wasted space has been realized before, and what you are talking about is distributed computing. For example GIMPS and SETI@home both use unused cpu cycles, so you get 100% of your cpu time going to something important. It would be nice if buisnesses found a way to distribute their processes on big jobs, but the fact is that most users do not need all the power they have for mundane things such as word editing and e-mail.

THink of how much wasted cycles there are running Screen Saver, or a Word document.

Do you mean you want people to always use emacs, or what I said above?

Re:Screw CMT; Time to use wasted CPU (2, Interesting)

David McBride (183571) | more than 8 years ago | (#12802122)

I would be far more interested in taking advantage of all the CPU cycles that run all over at Businesses.

Condor [wisc.edu] .

Re:Screw CMT; Time to use wasted CPU (0)

Anonymous Coward | more than 8 years ago | (#12802257)

Look, if you have 32 threads operating at 1/32 of GHz, or you have 1 thread operating at 2GHz, then it is a basic wash (not really, but close enough).

Except that the 2GHz takes major power to run (and then to cool), and if the one thread stalls your CPU is doing nothing.

With the slower CPU it's lower TCO, as well even if a couple of threads stall on I/O or memory you're still using the CPU for useful work.

Programming isn't up to it (5, Interesting)

Toby The Economist (811138) | more than 8 years ago | (#12802026)

32 threads in hardware on one chip is the same as 32 slow CPUs.

Current programming languages are insufficiently descriptive to permit compilers to generate usefully multi-threaded code.

Accordingly, multi-threading is currently handled by the programmer; which by and large doesn't happen, because programmers are not used to it.

A lot of applications these days are weakly multi-threaded - Windows apps for example often have one thread for the GUI, another for their main processing work.

This is *weak* multi-threading; because the main work done occurs within a single thread. Strong multi-threading is when the main work is somehow partioned so that it is processed by several threads. This is difficult, because a lot of tasks are inherently essentially serial; stage A must complete before stage B which must complete before stage C.

The main technique I'm aware of for making good use of multi-threading support is that of worker-thread farms. A main thread receives requests for work and farms them out to worker threads. This approach is useful only for a certain subset of problem types, however, and within the processing of *each* worker thread, the work done itself remains essentially serial.

In other words, clock speeds have hit the wall, transistor counts are still rising, the only way to improve performance is to have more CPUs/threads, but programming models don't yet know how to actually *use* multiple CPU/threads.

El problemo!

--
Toby

Re:Programming isn't up to it (1)

Ann Elk (668880) | more than 8 years ago | (#12802140)

32 threads in hardware on one chip is the same as 32 slow CPUs.

So, Sun managed to put an NCR Voyager on a single chip? Uhh... cool?

Re:Programming isn't up to it (1)

dchallender (877575) | more than 8 years ago | (#12802221)

Far too many years ago I remember using Helios "parallel C" on a transputer network (in this case actually a network of PCs, each PC modeling one transputer). 1. The language "enhancements" available encouraged more parallelism than normal with only tiny changes to coding approach (a lot of work done by the compiler obviously) 2. A lot of work on parallelising, was done by the "controller" that parceled off work to the "transputers" (apols for bad terminoloy , this was around 15 years ago and my memory of those days is hazy). I'm sure these days similar minor "addons" to common languages to encourage high level parallelism, coupled with some beefy analysis at compiler level to enable extra parallelism and coupled to a dynamic run time analysis tool, which could spot parallelism opportunities (as all programmers will know, some optimizations are not obvious at code analysis stage, only become apparent when code executes) when application running. I'm currently working on projects that would massively benefit from multi threaded CPU's - currently work is farmed out from central server to multiple processing clients, being able to have multi threaded CPUS would help this enormously.
--
Dave
Generated by SlashdotRndSig [snop.com] via GreaseMonkey [mozdev.org]

Re:Programming isn't up to it (4, Interesting)

flaming-opus (8186) | more than 8 years ago | (#12802323)

You are absolutely incorrect.
multi-threaded programming is the predominant programming model on servers. Some tasks, such as web serving, mail serving, and to some degree data-base machines scale almost linearly with the number of processors. All of the first tier, and some of the second tier server manufacturers have been selling 32+-way SMP boxes for years. They work pretty damn well.

Sun is not trying to create a chip to supplant pentiums in desktops. They are not going for the best Doom3 performance. They want to handle SQL transactions, and IMAP requests, and most likely are targetting this at JSP in a big way.

As a user of a slightly aged sun SMP box, I'd rather have those many slow CPUs and the accompanying I/O capability, than a pair of cores that can spin like crazy waiting for memory.

Sun Fortress, Haskell and Erlang (1, Insightful)

Anonymous Coward | more than 8 years ago | (#12802410)

and other such languages will become more popular as this new multithreaded world takes hold because they embed the multithreaded concepts into the language without explicit programmer interaction. C, C++, Java style threading and mutex constructs are error-prone and awkward to use.

Missing the point (1, Insightful)

Anonymous Coward | more than 8 years ago | (#12802067)

All of these recent articles about multi-cores, multiple pipelines of execution seem to miss the real value of theis technology; the provisioning of multiple Virtual Machines real-time on the same system. While most software will never use the multi-thread, multi-CPU capabilities of even the quad core AMD products like VMWare are now allowing you to dynamically provision systems on demand to deal with load. Another great use is for server consolidation; instead of 10 1U racks to handle web farming, try a 16 way box that can provide a single point of reliability, management and execution for those services. This is about horizontal scaling in a vertical fashion.

OLTP systems (2, Informative)

bunyip (17018) | more than 8 years ago | (#12802070)

Now of course, the room was full of Sun infrastructure weenies, so if there's something terribly obvious in records management or airline reservations or payroll processing that doesn't parallelize, we might not know about it.

Well, since I work in airline reservations systems, I'll add my $0.02 worth...

Most OLTP systems will benefit from CMT and multi-core processors. We had a test server from AMD about a month before the dual-core Opteron was announced, we did some initial testing and then put it in the production cluster and fired it up. No code changes, no recompile, no drama.

IMHO, the single-user applications, such as games and word processors, will be harder to parallelize.

Alan.

What a totally vague and useless post, yipee! (2, Insightful)

tomstdenis (446163) | more than 8 years ago | (#12802076)

First off, performance + java != good idea. Not trying to camp fanbois here but if you really need "down to the metal" performance you're writing in C with assembler hotspots.

So the observations that there is too much locking in Java's standard api is informative but not on-topic. the fact that the standard solution is to use a completely new class [e.g. StringBuilder] is why I laughed at my college profs when they were trying to sell their Java courses by saying "and Java is well supported with over 9000 classes!".

In the C and C++ world things get extended but also fixed at the same time. We can still use the strncat function which has been around for a while EVEN IN threaded environments...

Also, he totally fails to point out that extra threads [e.g. register sets] only pay off when the pipeline is empty. So it's a catch-22. You either have a very efficient pipeline that you can cram full of a single thread's instructions or you have a shoddy one where you're only hope is to mix in other threads.

Think about it. If you only have one ALU and 32 threads that means each individual thread works at 1/32 the normal speed. Even if they're a lower/higher priority!

That then gets into two camps. Are you threading because the performance of the pipeline sucks [e.g. dependencies in the P4] or because you want to interleave instructions [e.g. twice the clock rate but half the performance]. If it's the latter than even if you turn off 31 of 32 threads you still end up with one weak ALU.

Consider the AMD64 for instance. It usually gets an IPC that is pretty high [usually in the 1.5-2.5 range] which means that it's retiring instructions from a single thread at pretty much the entire capacity of the chip. Adding extra threads doesn't help.

Consider then the P4. It usually gets an IPC of 0.5 to 1 [for ALU code, which is observable by the fact it's about as fast as a half-clockrate Pentium-M]. This means it's two ALUs are not always busy and an additional thread could bump the IPC up to 1-1.5 range.

I know [for instance] that with HT turned on my 3.2Ghz Prescott compiles LibTomCrypt in close to the same time as my 2.2Ghz AMD64 [the P4 takes 5 seconds longer, without HT it takes about 15 seconds longer].

So the only saving grace is an efficient ALU so that you can run single tasks at least somewhat efficiently. Then tacking on the extra threads doesn't help as an efficient ALU won't have many bubbles where other threads could live.

So you end up with essentially a hardware register file but still 1/2 the performance. Remember that the goal of multi-processing is closer to 'n' times faster with n processors.

The BEST a single core multi-thread design can hope for is the performance of a single core single thread design...

Whoopy...

Multi-threading is NOT the future. Multi-cell is. Where you have dedicated special purpose [re: space optimized] side-cores that do things like "I can do MULACC/load/store REALLY REALLY QUICK!!!".

In other words, "yet another press release on /.".

Tom

Re:What a totally vague and useless post, yipee! (0)

Anonymous Coward | more than 8 years ago | (#12802329)

The BEST a single core multi-thread design can hope for is the performance of a single core single thread design...

And what's the best a DUAL core multi-thread design can hope for? Which after all is what the article is talking about (well actually OCTA core)

Re:What a totally vague and useless post, yipee! (0)

Anonymous Coward | more than 8 years ago | (#12802382)

So the observations that there is too much locking in Java's standard api is informative but not on-topic. the fact that the standard solution is to use a completely new class [e.g. StringBuilder] is why I laughed at my college profs when they were trying to sell their Java courses by saying "and Java is well supported with over 9000 classes!".

I thought Java was great at uni because I didn't know about Perl. Surely the point of APIs and abstraction is that it shouldn't matter what the guts are, the user just uses the StringBuffer without caring whether the JRE puts locks in or not.

We all are (1)

3770 (560838) | more than 8 years ago | (#12802096)

We all are.

If one of your favorite applications happen to be multithreaded then that's gravy.

But you'll benefit anyway. If you bring up your process list you'll see that you have probably at least 10 processes. These will now be able to run independently.

Also, the windows kernel itself can benefit from hardware threads.

Re:We all are (1)

tomstdenis (446163) | more than 8 years ago | (#12802125)

This is total f'ing hype. If you have an efficient ALU multi-threading won't help crap [in the hardware front, it does in software where you may have blocked threads, etc...].

Think about it this way. You have one car that can carry you and your buddies to work at 50mph and two cars that can take you and your buddies to work at 30mph.

Sure the two cars let you do independent things but when you're working on one task [getting to work] you're not ahead.

In a video game context for instance, you do have multiple threads but the big ones are where 99% of the time is spent [e.g. AI, TL, models]. Giving EQUAL processing resources to something as trivial as audio or network code isn't very smart.

Hyperthreading only pays off for the Intel P4 because the ALU is so notoriously weak that it has the bubbles in the pipeline that another thread can fill.

This isn't true about all processors. Sure HT could work with the AMD64 but you'd see such a marginal [if any] improvement that the size increase would make it cost ineffective.

Tom

What doesn't scale (and what does) (1)

davecb (6526) | more than 8 years ago | (#12802098)

Last year I as at a big commercial shop, looking at performance of a bunch of billing-like programs, and noticed:
  • Some older C, C++ and embedded-SQL programs are written without consideration of parallelization: they're single-process single-thread.
  • If the customer is large, the majority of the single-process single-thread programs have been rewritten to allow one to run multiple instances, so they can use more than one CPU.

The latter can scale on multi-processors, and mostly do. Much of our performance work centered on finding out how many processes to run, and whether to group them all on one processor board to get short memory access times. Plus fixing obvious things, like O(n^2) algorithms.

in my personal opinion, the consideration for older programs are as follows:

  1. Can you change the start-up of single-process single-thread programs to split up the input data and run multiple instances.
  2. Are there any bad algorithms in use, such as singly-linked lists for large data stores. This has nothing whatsoever to do with CMT on first glance, but turns out to be a limit on the performance you're using multiple instances to achieve!
  3. Is there data shared between the instances, because if so, you will have to add locking, which is slowish on large multiprocessors, and arguably faster on CMT processors with very good memory locality.

So: adding CMT makes it a good idea to parallelize older programs, O(n^2) algorithms in CMT or multi-CPU programs are every bit as bad as in uniprocessor programs, and introducing locking is bad, but locking on CMT needs to be measured against regular multiprocessors to see if it's going to be better (my speculation) or worse.

--dave

Am I the only one... (1)

roach2002 (77772) | more than 8 years ago | (#12802153)

Am I the only one who thought a bunch of SoftWare Weenies were going to be ready for Country Music Television?

(Man I'm having a bad case of the Mondays)

Need a breakthrough in hiding concurrency (2, Insightful)

argent (18001) | more than 8 years ago | (#12802154)

Every time someone exposes concurrency at some layer as a way of improving performance, rather than because you're implementing a process that's inherently concurrent, it's a huge clusterfuck. Doesn't matter whether it's asynchronous I/O, out-of-order execution, multithreaded code, or whatever. Even when you're dealing with a concurrent environment like a graphical user interface the most successful approaches involve breaking the problem down into chunks small enough you can ignore concurrency.

One of UNIX's most important features is the pipe-and-filter model, and one of the really great things about it is that it lets you build scripts that can automatically take advantage of coarse-grained concurrency. Even on a single-CPU system, a pipeline lets you stream computation and I/O where otherwise you'd be running in lockstep alternating I/O and code.

That's where the big breakthroughs are needed: mechanisms to let you hide concurrency in a lower layer. Pipelines are great for coarse-grained parallelism, for example, but the kind of fine grain you need for Niagara demands a better design, or the parallelism needs to be shoved down to a deeper level. Intel's IA64 is kind of a lower level approach to the same thing where the compiler and CPU are supposed to find parallelism that the programmer doesn't explicitly specify, but it suffers from the typical Intel kitchen-sink approach to instruction set design.

Hdw multi-thread vs multi-CPU (1)

Intron (870560) | more than 8 years ago | (#12802158)

Isn't the big issue cache? On a multi-CPU system running one thread per CPU, each thread has its own cache. On HMT, the cache is shared. Threads running in different sections of code on different data will tend to reduce cache hits, offsetting the performance gain of the multiple threads. The limit on increasing the number of threads is that most of the threads will be waiting on cache misses.

The bottlenecks (3, Interesting)

davecb (6526) | more than 8 years ago | (#12802184)

CMT is a good approach for dealing with the speed mismatch between CPUs and memory, our current Big Problem

I'll misquote Fred Weigel and suggest that the next problem is branching: Samba code seems to generate 5 instructions between branches, so suspending the process and running something else intil the branch target is in I-cache seems like A Good Thing (;-)).

Methinks Samba would really enjoy a CMT processor.

--dave

Re:The bottlenecks (0)

Anonymous Coward | more than 8 years ago | (#12802307)

I though Samba had a problem with reader/writer contention on its internal tables. I thought there was going to be some work putting in a lock-free solution but I don't think anything came of it. With 32 cores the contention will get much much worse.

dead end (2, Insightful)

cahiha (873942) | more than 8 years ago | (#12802205)

Threads are actually one of the simplest form of parallelism to deal with and we have had decades of experience with them. That's why Sun loves them: it fits in well with their big-iron philosophy and hardware and makes it easy for their customers to migrate to the next generation.

But the future of high-end computing, both in business and in science, will not look like that. Networks of cheap computing nodes scale better and more cost-effectively. Many manufacturers have already gone over to that for their high-end designs. That's where the real software challenges are, but they are being addressed.

Processors with lots of thread parallelism will probably be useful in some niche applications, but they will not become a staple of high-end computing.

How to make code run fast? (3, Interesting)

Apreche (239272) | more than 8 years ago | (#12802234)

Easy. In present days there are some assembly instructions that can be executed simultaneously. With a chip like this however, all bets would be off. Instead of just a meager few instructions that could be executed simultaneously you would be able to execute any number of instructions simultaneously.

So if you have a function that say does 10 additions and 10 moves you would first figure out if any of them needed to be done before or after each other. Then see which ones don't matter. Then write the function to do as many at once as possible.

It really doesn't matter for anyone other than the compiler writers. Those guys will write the compiler to do this kind of assembly level optimization for you. The trick is writing a high level language, or modifying an existing one, so the compiler can tell which things must be executed in order and which can be executed side by side.

let's go verb hunting! (0, Offtopic)

illtron (722358) | more than 8 years ago | (#12802238)

This year, more threads next year.

Hmm, I can't seem to find one. For arguably one of the tech-savviest sites on all of the Internet, Slashdot contributors have surprisingly awful grammar.

We hear a lot about the lack of technical education and preparation for engineering and science careers these days, but sometimes it looks like English instruction is just as bad.

I'm not looking for perfection, and sometimes, like in comments, speed matters more than grammatical accuracy, but when you're submitting a story, it really can't hurt to read it over to make sure it fits elementary school standards.

P.S., there is no such word as "virii." There, now this is officially off-topic.

500W power supply? (1)

sgt scrub (869860) | more than 8 years ago | (#12802246)

"So, given that CMT chips use less watts per unit of computing, why aren't..."

I think the "requires a 500W power supply part should answer this question".

What will this Cell Based system look like? - Our Speculation
- MotherBoard supports up to 4 Cell Chips
- Each Cell Chip will have its own Rambus main memory. The memory will be on plug in strips much like DDR etc
- The Cell Chips on the motherboard will cooperate by means of FlexIO which is a multilane/serial technology.
- There will be two slots meant for video cards. Similar to AGP but designed for Rambus, not AGP compatable, 10x faster than AGP.
- All other I/O will be done by means of FlexIO similar to what is now possible with USB - except the system will boot from flexIO
- There will be no legacy hardware support - NO PCI, AGP, usb, serial, parallel, ps2 , ethernet - nothing
- The power supply will need to be about 500 watts.
- power management will allow cell chips and parts of cell chips to be powered down when not in use.
- There will be 16 FlexIO ports coming out the back. 2 in and 2 out for each Cell Chip.
- Cluster can be created by stacking Cell Boxes and connecting them with the FlexIO cables.

http://cellsupercomputer.com/power_pc.php [cellsupercomputer.com]

My CPU left me, and the Flatscreen died.... (1)

the_weasel (323320) | more than 8 years ago | (#12802248)

Am I the only person who was wondering why slashdot was talking about Country Music Television for a moment there?

* crickets *

Time to hand in my nerd badge I guess, and slink off into the sunset.

Seriously, though - thanks for clarifying the meaning of CMT in the blurb. A big step forward from the usual Slashdot blurb.

Does anyone read these? (0)

Anonymous Coward | more than 8 years ago | (#12802292)

Maybe the need for smaller transistors and wires on chips has been fueling the growing nanotech industry, so maybe we should continue working on smaller and faster chips, though they might not be practical.

Already been done... (0)

Anonymous Coward | more than 8 years ago | (#12802320)

Chuck Moore (inventor of Forth) has been banging out these kind of systems for years. Take, for example, his 25x Microcomputer chips, [forthfreak.net] which are essentially 25 computers on a 7 square mm die.

It was designed using his forth CAD software, probably running on one of his earlier cpus.

Here is a tip for you hardware guys... (0, Troll)

dwalsh (87765) | more than 8 years ago | (#12802351)

If you want us to accommodate your inability to improve single threaded performance and rearchitect 20 years of software for parallel computing, then how about this:

DON'T CALL US WEENIES! Ya bunch of Verilog writin', pocket protector wearing misfits, who take six months to implement what we can do in five lines of code, and cannot maintain app. integrity even in a single core non-hyperthreaded CPU! (See here: http://www.comp.nus.edu.sg/~abhik/pdf/pact04.pdf [nus.edu.sg] ).

Yours Sincerely,
A Software Engineer.

Performance (1)

ZuggZugg (817322) | more than 8 years ago | (#12802370)

It's funny Sun claimed 15x performance increase with Niagara about 16 months ago, but they never bothered to put that claim into any context. 15x the then 900 MHz SPARC III, I doubt it seriously. I doubt even 15x their low-end SPARC IIe in their now discontinued blades.

It appears that Sun engineers have hit a MHz wall sooner than the likes of Intel/AMD/IBM and are going extreme parallelism.

Based on what I've read the Niagara CPU will only be deployed in a single slot server...the only thing it might be useful for is front-end web servers and light-duty app servers. It doesn't sound like FP performance will be too exciting so I doubt it will find it's way into renderfarms.

I would like to see a showdown between the IBM/Toshiba Cell and Niagara.

It's my opinion that the Sun engineering team are in serious trouble.

Re:Performance (0)

Anonymous Coward | more than 8 years ago | (#12802412)

I agree, I don't think many people want to go back 5-7 years in single-thread performance.

PHP and multi-threading (0, Flamebait)

Alt_Cognito (462081) | more than 8 years ago | (#12802403)

This guy and his newsflashes: PHP was not designed for extensive multi-threaded coding.

And guess what? It never will be. Why should it? It's a scripting language designed to return webpages. Webpages ought to be simple and return quickly. These Sun java weenies seem to think you should be rewriting the webserver all over again, designing in tricky multi-threaded support. No wonder we have all these "appservers" doing double duty as a webserver. It's completely redundant, and frankly, most of them are shoddy in comparison in reliability to Apache.

Of course, he knows this, and in the following paragraph talks about how wonderful the performance of apache will be on these new chips.

What is the problem here? (1)

borud (127730) | more than 8 years ago | (#12802415)

Every time someone mentions systems with more processors or more cores, there is a lot of whining from people who think that making software take advantage of more processors is such a monumental task.

It isn't. And it isn't just scientific data chugging which would benefit from increased availability of actual concurrent processing in typical desktop computers; there are currently many of these PCs that already to things that can be paralellized.

For instance image processing. For many kinds of image processing it isn't even hard to partition the problem so that you can make use of more than one processor. I use my PC for processing pictures taken with a digital SLR. A lot of people I know do video editing on a PC and even people who have small home studios for music production centered around their PCs or Macs.

Even if you are not running multithreaded applications that are heavily CPU-bound, multiple CPUs or CPU cores is useful. Currently my desktop computer runs 108 processes. between 3 and 6 of these processes were on cursory inspection marked as "runnable", yet I have only one CPU. I'd probably benefit from another CPU or three because right now I'm not really doing anything that requires a lot of CPU grunt.

There is no problem. It isn't as hard as people say to make use of more processors, more cores or more low-level support for multithreading. If anyone is trying to make you believe there's a big problem, you can safely ignore them.

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...