Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Grid Processing

Hemos posted more than 10 years ago | from the all-together-now dept.

Hardware 130

c1ay writes "We've all heard the new buzzword, "grid computing" quite a bit in the news recently. Now the EE Times reports that a team of computer architects at the University of Texas here plans to develop prototypes of an adaptive, gridlike processor that exploits instruction-level parallelism. The prototypes will include four Trips(Tera-op Reliable Intelligently Adaptive Processing System) processors, each containing 16 execution units laid out in a 4 x 4 grid. By the end of the decade, when 32-nanometer process technology is available, the goal is to have tens of processing units on a single die, delivering more than 1 trillion operations per second. In an age where clusters are becoming more prevalent for parallel computing I've often wondered where the parallel processor was. How about you?"

cancel ×

130 comments

Sorry! There are no comments related to the filter you selected.

FIRST POST (-1, Offtopic)

Anonymous Coward | more than 10 years ago | (#6962880)

The CEO does it!

Re:FIRST POST (-1, Offtopic)

Anonymous Coward | more than 10 years ago | (#6962936)

I'm so happy for you [tinyurl.com]

Just out of curiosit... (-1, Redundant)

Anonymous Coward | more than 10 years ago | (#6962947)

A question for anyone with such experience:

I assume it would be somewhat difficult to program efficiently for such systems. I don't mean just getting programs to run, but getting the most bang for your buck. Can anyone here confirm or deny this? Also does anyone know where to find resources on the topic of programming such machines (and no, I am not talking about smp docs or bewoulf docs or even pvm docs)?

I am all thrilled on the prototypes, that will include four Trips processors, each containing 16 execution units laid out in a 4 x 4 grid. By the end of the decade, when 32-nanometer process technology is available, the goal is to have tens of processing units on a single die, delivering more than 1 trillion operations per second.

IT could mean a miraculous day for computer users everywhere.

Re:Just out of curiosit... (-1, Troll)

s0m3body (659892) | more than 10 years ago | (#6963028)

> IT could mean a miraculous day for computer users everywhere.

imagine a new m$ [office] worm spreading 1000 times faster ...

Post A1 (-1, Offtopic)

ellem (147712) | more than 10 years ago | (#6962883)

I'm the first on the Grid!

Re:Post A1 (-1)

akpcep (659230) | more than 10 years ago | (#6962889)

Failed. Miserably.

Yep (1, Funny)

akadruid (606405) | more than 10 years ago | (#6962886)

Yep, that's just what I wondered.

And before anyone says it, no I have ever thought about a beowolf cluster of those...

Re:Yep (1)

hackrobat (467625) | more than 10 years ago | (#6964980)

beowulf

!

Yes (-1, Troll)

Anonymous Coward | more than 10 years ago | (#6962888)

Yes, I have often wondered where it is as well. Probably with my car keys (since I can never find those either).

Terminator? (0)

kompiluj (677438) | more than 10 years ago | (#6962899)

I have seen something like this somewhere... it was in Terminator II!!!

Re:Terminator? (0)

Textbook Error (590676) | more than 10 years ago | (#6962965)

You may also have seen it here [slashdot.org] a fortnight ago. Check it out - it's a site with loads of geek stories like this. Oh, wait...

Sure (-1, Offtopic)

Amonynous Coward (705852) | more than 10 years ago | (#6962902)

In an age where clusters are becoming more prevalent for parallel computing I've often wondered where the parallel processor was. How about you?

I'm fine, thanks for asking.

For the rest of us (0, Offtopic)

Moderation abuser (184013) | more than 10 years ago | (#6962906)

Grid Engine is free, available on most of the Unix platforms, easy to set up and allows you to scale your processing pretty much linearly.

If you want to know more, I'd be happy to consult at $300/hour.

Re:For the rest of us (3, Funny)

Anonymous Coward | more than 10 years ago | (#6963019)

If you want to know more, I'd be happy to consult at $300/hour.

Which is why most of your tech jobs are being shipped overseas.

Re:For the rest of us (1)

Adm1n (699849) | more than 10 years ago | (#6964223)

Do you speak hindi or Mandrin? because they do not speak english. Nor do they have the command of it us overpaid north-ameircan techno geeks maintain. Thus you will have your nice fincancial analasis or project outline impress your executives warrenting the $300/hr, besides we live here and have ready access to technology. They do not, as a result must be trained (ie: we ship some of us there and pay god-awful amounts to do so) in both technical and language not to mention customer service.

Re:For the rest of us (-1, Flamebait)

Anonymous Coward | more than 10 years ago | (#6963060)

If you want to know more, I'd be happy to consult at $300/hour.

Wow, you charge the same rate as your mother! But I bet your services don't include sperm, vomit, blood, and a large black man with a whip and leather mask. Or mabey they do, I dunno.

Re:For the rest of us (1)

fitten (521191) | more than 10 years ago | (#6963833)

"allows you to scale your processing pretty much linearly."

Interesting promise. I guess it depends on what you mean by "pretty much"...

Would it be possible... (2, Interesting)

PakProtector (115173) | more than 10 years ago | (#6962911)

To make a brick of these things, or some kind of cube, with massive processing power that one could just carry around and interface with via their PDA?

Just think about carrying around something as fast, if not faster, than your desktop that fits in the palm of your hand.

Re:Would it be possible... (0)

Anonymous Coward | more than 10 years ago | (#6962983)

If those 'bricks' fit in the palm of your hand, then imagine how many of those bricks would fit in your desktop machine. No technology would ever make PDA's faster than desktop machines, since that same technology would be applied for them.

Re:Would it be possible... (1)

Not_Wiggins (686627) | more than 10 years ago | (#6963324)

Ah... one step closer to making the Borg a reality... ;)

Re:Would it be possible... (1)

aj444 (702753) | more than 10 years ago | (#6964483)

just think about carrying around a pulsating pink sludge inside your skull, imagine the massive parallel processing power

Battletoads anyone? (1)

jeffy210 (214759) | more than 10 years ago | (#6962924)

My first thought when I saw "Trips" was "Total Reality Intrgrated Playing System" from Battletoads... What's next, we're going to get sucked into the gamescape? :)

Re:Battletoads anyone? (1)

paradesign (561561) | more than 10 years ago | (#6963193)

nah, thats for TRON.

Just out of curiosity.... (5, Interesting)

exebeoex (561339) | more than 10 years ago | (#6962928)

A question for anyone with such experience:

I assume it would be somewhat difficult to program efficiently for such systems. I don't mean just getting programs to run, but getting the most bang for your buck. Can anyone here confirm or deny this? Also does anyone know where to find resources on the topic of programming such machines (and no, I am not talking about smp docs or bewoulf docs or even pvm docs)?

Re:Just out of curiosity.... (2, Informative)

Stone316 (629009) | more than 10 years ago | (#6962939)

I'm not sure wabout other platforms but in the case of Oracle, they say you don't require any code changes. Your application should run fine right out of the box.

Re:Just out of curiosity.... (0)

Anonymous Coward | more than 10 years ago | (#6963131)

Grid computing is bullshit, the technology is ancient. Hell I was playing Battleship on my 386 years ago. A-10, hit or miss? You could probably do that in QBasic. Don't listen to that guy that wants to charge $300 and hour, you can find it in the bargain bin at Gamestop.

Re:Just out of curiosity.... (2, Interesting)

Adm1n (699849) | more than 10 years ago | (#6963253)

Hypercube Theory handels this quite well. Addressing would be n-dimensional you can google hypercube and find lots of nifty SGI doc's for thier old Onyx architechure but it also applies to Beowulf's, PVM, MPI, Cray and any other massivlly parallell architechure. This would be a hypercube on a chip as opposed to a hypercube of chips. And I'm not going to mention the complexities of Queing theory but at 32nm it's a Doctoral Thesus waiting to happen.

Re:Just out of curiosity.... (4, Informative)

gbjbaanb (229885) | more than 10 years ago | (#6963388)

Most parallel systems only work for a certain type of problem - one where processing can be split into many small chunks, each one non-dependant on the others.

eg. who cares how many instructions you can process in parallel, if module A requires data from module B. In these cases parallelisation is limited to making each module run faster (if it doesn't have sub dependencies, of course), the entire program doesn't benefit from the parallelisation.

Good examples of parallel processing are the ones we know - distributed apps like SETI@home, graphics rendering, etc.

Bad systems are everyday data processing systems - they typically work on a single lump of data at a time in sequences.

A good source of parallel programming is http://wotug.ukc.ac.uk/parallel/ or, of course, google.

Fortran 95 oddly enough is multi-processor aware. (4, Informative)

goombah99 (560566) | more than 10 years ago | (#6964351)

Fortran is NOT for every day programming of word processors and such. However the Modern Fortran Language probably ought to be the choice for most scientific programming, its just that people think of it as an "old" as in decrepit Languange and dont learn it.


for parallel processing fortran boast many language level features that give ANY code implicit parallelism and implicit multi-threading and implicit distribution of memory WITHOUT the programmer cognizantly invoking multiple threads or having to use special libraries or overloaded commands.
An example of this is the FORALL and WHERE statements that replace the usual "for" and "if" in C.

FORALL (I = 1:5)
WHERE (A(I,:) /= 0.0)
A(I,:) = log(A(i;0)
ENDWHERE
call some_slow_disk_write(A(I,:)
END FORALL

the FORALL runs the loop with the variable "i" over the range 1 to 5 but in any order not just 1,2,3,4,5 and also of course can be done in parallel if the compiler or OS, not the programmer, sees the opportunity on the run-time platform. The statement is a clue from the programmer to the compiler not to worry about dependencies. Moreover the program can intelligently multi-thread so the slow-disk-write operation does not stop the loop on each interation.

The WHERE is like an "if" but tells the compiler to map the if operation over the array in parallel. What this means is that you can place conditional test inside of loops and the compiler knows how to factor the if out of the loop in a parallel and non-dependant manner.

Moreover, since the WHERE and FORALL tell the compiler that the there are no memory dependent interactions it must worry about. thus it can simply distibute just peices of the A array to different processors, without having to do maintain concurrency between the array used by different processcors, thus elminating shared memory bottlenecks.

Another parallelism feature is that the header declaration not only declare the "type" of variable ,as C does, but also if the routine will change that variable. This lets the compiler know that it can multi-thread and not have to worry about locking an array against changes. In the example, the disk-write subroutine would declare the argument (A) to be immutable. Again the multi-threading is hidden from the user, no need for laborious "synchronize" mutex statements. It also allows for the concept of conditionally-mutable data.

Other rather nice virutes of FORTRAN is that it uses references rather than pointers (like java). And amazingly the syntax makes typos that compile almost impossible. that is, a missing +,=,comma, semi colon, the wrong number of array indicies, etc... will not compile (in contrast to ==, ++, =+ and [][] etc ...).

One sad reason the world does not know about these wonderful features, or repeats the myths about the fortran language missing features is due to GNU. yes I know its a crime to crtisize GNU on slashdot but bear with me here because in this case they desereve some for releasing a non DEC-compatible language.

for the record, ancient fortran 77 as welll as modern fortran 95 DOES do dynamic allocation, support complex data structures (classes), have pointers (references) in every professional fortran compiler. Sadly GNU fortran 77, the free fortran, lacks these language features and there is no GNU fortran 95 yet. This is lack prevents a lot of people from writing code in this modern language. if Gnu g77 did not exist the professional compilers would be much more affordable. So I hope some reader who know about complier design is motivate to give the languishing GNU fortran 95 project the push it needs to finnish.

In the age of ubiquitous dual processing fortran could well become a valuable scientific language due to its ease of programming and resitance to syntax errors

MOD UP +insightful+informative (0)

Anonymous Coward | more than 10 years ago | (#6964570)

My God, that is probably the most insightful and informative comment I have ever read on slashdot, bar none!

Re:Fortran 95 oddly enough is multi-processor awar (2, Informative)

sketerpot (454020) | more than 10 years ago | (#6964868)

There's a good book explaining a lot of this stuff in detail available from O'reilly [oreilly.com] . I can vouch for it having some neat stuff, and it covers how to write fortran in such a way as to take advantage of the parallelism features.

Uh oh, Terminator andriods will rule the earth! (3, Funny)

scorp1us (235526) | more than 10 years ago | (#6962951)

Anyone remember from T2 what the CPU looked like? It was a 3 dimentional grid of CPUs...

Don't say I didn't warn you!

Re:Uh oh, Terminator andriods will rule the earth! (0)

Anonymous Coward | more than 10 years ago | (#6963227)

I believe the machines have decided against killing John Connor and all the tedious mucking about with Skynet and military C&C computers, and have now opted for the far more subtle plan of getting one of their machines into office as the President of the United States of America, and have him simply push the button himself.

This is not "Grid Computing" (4, Interesting)

pridkett (2666) | more than 10 years ago | (#6962953)

This is not an example of the Grid Computing (ala Globus [globus.org] ) that we've been hearing about. This is another example of laying out processor cores on a chip. So a better thing would be to compare this to the ideas for the UltraSPARC V and IBM BlueGene computers where multiple processing cores are put on one chip and then arranged in a grid (think physical grid) architecture.

Grid Computing deals with computation and information sharing seemlessy across a network, they used to always say like how the power grid works. Which in reality is about right as it doesn't always work as advertised.

Anyway, Grid Computing is mainly concerned with software to allow multiple computers to work together seemlessly. This includes registry services, single sign of, information transfer, etc.

This appears to be the rather fortunate result of a phenomenon called "Buzzword collision", where two different projects pick the same buzzword in hopes to really confuse people who don't read the articles and trick PHBs into thinking that each project is ueberimportant.

Re:This is not "Grid Computing" (2, Interesting)

Rolken (703064) | more than 10 years ago | (#6962989)

They do work on the same principle though. It's just that grid computing on a network involves processors that are vastly separated and consume different resources, whereas the "new" grid computing involves tightly bound, hardwired processors that share resources. It's not like you have to be an engineer to figure out the difference... and if you don't read about it and you get confused, that's your own fault. ;)

Re:This is not "Grid Computing" (-1, Troll)

Anonymous Coward | more than 10 years ago | (#6963081)

Interesting, if not slightly askew, points. However I would be remiss if I didn't mention how disgusting I think it is when you pull your penis out of Hemos' ass and he puts it in his mouth when you're about to cum. I mean, it's his OWN SHIT AND BLOOD he's putting in his mouth! Gross.

Buzzword collision (1)

BigGerman (541312) | more than 10 years ago | (#6963180)

Naturally, if it was called Slow-poke Faulty, Blindly Restrictive Procesing System I would be less inclined to trust this solution.

Re:This is not "Grid Computing" (3, Interesting)

Rufosx (693184) | more than 10 years ago | (#6963624)

If this really was just a grid layout of cores on a chip, then no, I would not call it grid computing.

But from looking at the diagram and rereading the article a few times, I think this goes far beyond that and approaches something that really could be called grid computing.

Instead of just being issued instructions from a central control unit, these units seem to have far more developed abilities to communicate with each other and work together. Not just for the issuing of instructions, but during execution.

No Suprise (-1, Offtopic)

Anonymous Coward | more than 10 years ago | (#6962958)

because I /. I dont RTFA and since I dont RTFA all that entered my brain was 'uni students' and 'trips' and I thought... well, suprise suprise

Image a... (-1, Redundant)

Anonymous Coward | more than 10 years ago | (#6962961)

Beowulf cluster of those.

Had to be said.

Re:Image a... (0)

Anonymous Coward | more than 10 years ago | (#6963977)

Ahh, there it is. That's the post i came here to see ;)

What about Transputers? (5, Interesting)

Tangurena (576827) | more than 10 years ago | (#6962966)

Transputers were processors designed from the ground up for parallel processing. Have been around for years, but no one in America noticed them. Therefore they did not exist. I am surprised at the constant reinvention of the wheel, because of the NIH principle (Not Invented Here).

There are some programming languages designed for parallelism. Biggest hassle is efficiently partitioning problems into something parallel. Not all problems can be done faster by doing more of it at once.

connection machines too (1)

entartete (659190) | more than 10 years ago | (#6963114)

http://mission.base.com/tamiko/cm/
the connection machine was another parallel computing system (64k little bitty processors hooked together into a grid) that had a flurry of excitement around it (almost 70 of them in operation at the peak of activity!) and then sorta died off. alot of the problems with systems like this weren't really flaws in the basic idea, just economic issues. If you can make a cheap non parallel system run some ugly hack of solution to the problem in something semi close to the time that the elegant but expensive as far as price per unit of processing custom made parallel system can, then people aren't going to want to invest the money in a specialized system unless there is just no other way possible to get their work done without it. but being able to apply the same techniques that let moore's law bumble along should help things out.

Re:What about Transputers? (3, Funny)

marktoml (48712) | more than 10 years ago | (#6963340)

Oh, you mean 9 women can't have the baby in a month? Crap. Another good plan shot to hell.

Re:What about Transputers? (2, Informative)

GregAllen (178208) | more than 10 years ago | (#6963536)

no one in America noticed them
We used transputers on quite a large number of projects right here at the University of Texas.

the NIH principle
Actually, the problem was that they were slow and complicated. They went so long between family upgrades that eventually we could replace a large array of transputers with a few regular CPUs. Not to mention that we can also get a handy little thing like an OS on general purpose CPUs.

programming languages designed for parallelism
Did I mention complicated? Occam was part of the problem. The scientific world wants to program in C or Fortran, or some extension of them, or some library called by them. That's why MPI is so popular.

not all problems can be done faster by doing more of it at once
I'm not sure I agree. Having more capability at each compute node means less need for partitioning. (The part you say is hard.)

Obviously there's a lot of work to be done in parallel processing. You can hardly blame Inmos's problems on geography (or America for Inmos's problems). They looked very promising for awhile, but just didn't keep up.

Re:What about Transputers? (4, Interesting)

AlecC (512609) | more than 10 years ago | (#6963703)

Obviously there's a lot of work to be done in parallel processing. You can hardly blame Inmos's problems on geography (or America for Inmos's problems). They looked very promising for awhile, but just didn't keep up.

Seconded, loudly. Inmos was a classic case of great engineering trashed by lousy management. When the transputer came out, it was fantastic, leading edge stuff. But inmos turned everybody off bay saying that you had to use it their way and no other.

The thing that shows how good the transputer was that it was still selling ten years after it first came out, when it had been overtaken and lapped several times by conventional CPUs. But that cannot go on for ever - by the time they died, you could simulate a tranputer in a conventional CPU that cost less but ran faster.

Re:What about Transputers? (1)

Alioth (221270) | more than 10 years ago | (#6964757)

I believe there was a Unix-like OS for transputer systems (IIRC). I went to college at the UWE, where we had the Bristol Transputer Centre, and Inmos itself was quite nearby (I think I did either a 1st or 2nd year undergraduate project which involved Inmos). I remember they said that they tried to present an image that Inmos was a US firm in the US to help with marketing, since home grown stuff apparently sells better in the US.

Back to the OS. I think it was in use by Southampton University, and IIRC the main problem with it was that it couldn't handle a segmentation fault - the entire machine would crash if a program died with SIGSEGV. I'm sure the administrators really enjoyed running that system...

Sun may already be ahead of the game here(!) (3, Informative)

pr0ntab (632466) | more than 10 years ago | (#6962967)

Normally I don't pimp Sun, but here's something that makes me think they still have a finger on the pulse of things:
Read about plans for Sun's "Niagra" core [theregister.co.uk]

I understand they hope to create blade systems using high densities of these multiscalar cores for incredible throughput.

There's your parallel/grid computing. ;-)

Re:Sun may already be ahead of the game here(!) (2, Informative)

stevesliva (648202) | more than 10 years ago | (#6963526)

A more detailed article. [theregister.com] IBM has been doing dual-core processors in it's flagship Power line for a few years now, although it appears higher numbers of cores per die will only be appearing in more experimental IBM projects. Except perhaps the PS3 Cell Processor [arstechnica.com] , a collaboration of IBM and Sony. Since the Cell group is based in Austin, there's likely to be some collaboration between TRIPS and Cell. As a matter of fact, they sound very similar.

Grid computing? (5, Informative)

dan dan the dna man (461768) | more than 10 years ago | (#6962968)

I still think this is not what is commonly understood by the term "Grid Computing". Maybe it's the environment I work in but to me Grid Computing means something else [escience-grid.org.uk]

And is exemplified by projects like MyGrid [man.ac.uk] .

Parallel Processors? (0)

Anonymous Coward | more than 10 years ago | (#6962972)

How about the three ALU's in me Athlon? How's that for parallel?

Grid confusion (5, Informative)

Handyman (97520) | more than 10 years ago | (#6962978)

It's funny how people always seem to find a way to confuse what is meant by a "grid". The posting talks about a "4x4 grid" without clarification of the term "grid", which is confusing because grid computing has nothing to do with processing units being lined up in a grid. The "grid" in "grid computing" comes from an analogy with the power grid, not from any form of "grid layout". The analogy is based on the fact that with grid computing, you simply plug your "computing power client appliance" (not necessarily a PC, could be the fridge) into the "computing power outlet" in the wall (a network port, usually), and you can "consume computing power", like you would do with electricity. Computational grids don't even necessarily have to support parallel programs; it is easy to imagine grids that have a maximum allocated unit of a single processor. What makes such grids grids is that you can allocate the power on demand, when you need it, instead of that you have to have your own "computing power generator" (read: megapower CPU) at home.

Re:Grid confusion (0)

Anonymous Coward | more than 10 years ago | (#6963125)

That's why it's called Grid Processing and not Grid Computing.

What sort of computations will this be good at? (4, Insightful)

rhetland (259464) | more than 10 years ago | (#6962991)


I use parallel computing on a cluster, in which I divide up my computational domain into a number of chunks, and each chunk is farmed out to a processor. Communication between the processes is required at the chunk boundaries.

For this case, I see how my code is partitioned, and I also understand (on a general level, at least) what the limitations on speed are: information based between the chunks.

Now, how will this processor do its 'instruction level' parallelization? Will it be great at do loops (one 'do' per processer)? Will it be like a mini vector processor? What will break down the efficiency of the parallelization?

I have found that efficiency in parallelization is very application dependent after about 8-32 procesors. Will this break that barrier?

Most importantly, will it kick butt for MY applications?

Re:What sort of computations will this be good at? (1)

benpeter (699832) | more than 10 years ago | (#6963174)

Be very good at searching large key spaces, ie brute forcing encrypted material by testing every possible key.

If they're cheap and you can get the density up high enough maybe AES [nist.gov] won't last as long as we thought.

Gridlike Computing Vs Grid Computing (3, Informative)

jedigeek (102443) | more than 10 years ago | (#6962993)


We've all heard the new buzzword, "grid computing" quite a bit in the news recently.

The article doesn't actually have anything to do with "grid computing", but the processor's design is like a grid. The term "grid computing" [globus.org] often refers to large-scale resource sharing (processing/storage).

beowulf (0, Offtopic)

Anonymous Coward | more than 10 years ago | (#6962997)

What is the theoretical size limit of a beowulf cluster? 255? 35xxx? 10^20 ?

Has anyone thought of making a beowulf cluster of beowulf clusters yet?

Re:beowulf (1)

fitten (521191) | more than 10 years ago | (#6963916)

What is the theoretical size limit of a beowulf cluster? 255? 35xxx? 10^20 ?

This is usually limited by the amount of resources on the platforms. At times, this has been governed by such things as the number of open sockets the OS supported and/or how long it took to open all the connections to all the machines before RSH or the like started timing out and closing the connections.

Has anyone thought of making a beowulf cluster of beowulf clusters yet?

Yes. Grid computing encompasses this idea (things like Globus, etc.)

Re:beowulf (1)

Adm1n (699849) | more than 10 years ago | (#6964273)

Let us expand upond an N-dimensional Hypercube by adding more hypercubes as long as the interconnects scale accrodingly, processing power would scale logarythimicly. Ie; 2 dim Hcube has 2 interconnects of equal bandwidth. 4 dim has 16 so your connection bandwidth must be at least n^2 (that's Nsquared) where n = processors. Thus your crossbar is created. Using typical Queing theory and provided that you have a lot of netowrk bandwith your next issue is routing information in and out. So 1Terraflop is not unfathomable.

Network processors already implement this idea (0)

Anonymous Coward | more than 10 years ago | (#6962998)

Take a look at the intel ixp-2800. A network processor containing 16 multithreaded microprogrammable 32 bit processors and an xscale core riding herd. All are tightly coupled with a multi-layered memory architecture. Fun to program.

BS & hype (4, Interesting)

master_p (608214) | more than 10 years ago | (#6963004)

The prototypes will include four Trips processors, each containing 16 execution units laid out in a 4 x 4 grid. By the end of the decade, when 32-nanometer process technology is available, the goal is to have tens of processing units on a single die, delivering more than 1 trillion operations per second.

At 32 nanometers, Intel could put tens of HT pentium cores on a single chip, achieving the same result.

"One key question is, Will this novel architecture perform well on a variety of commercial applications?"

For computational problems that can be broken down into parallel computations, the answer is yes. For all the other types of problems, the answer is no. Although I have to admit that most algorithmic bottlenecks is in iterative tasks that are highly parallelizable.

On Trips, a traditional program is compiled so that the program breaks down into hyperblocks. The machine loads the blocks so that they go down trees of interconnected execution units. As one instruction is executed, the next one is loaded, and so on.

*cough* EPIC *cough* VLIW architecture *cough*

I support parallelism and I am looking forward to seeing it on my desktop, as it will increase the computational power of my computer tremendously. Unfortunately, it will mean new compilers and maybe programming languages that have primitives for expressing parallelism.

By the way, the transputer [google.com] chip was promising. The idea of lots of computational units running in parallel is nothing new(maybe each memory block must have its own processor to locally process and compute the data).

Re:BS & hype (2, Informative)

Valar (167606) | more than 10 years ago | (#6963163)

It's not as much hype as you would think (in the interest of full disclosure, I am a UT EE student and about half of my posts now on /. seem to be talking about something the university has done...). Yes, grid computing is a bad term for it, because it's already taken. I'm not sure whose fault it was that it got labelled that, but I doubt it was one of the guys actually working on this. They all seem like competitent lads. Now for what I actually have to say:

At 32 nanometers, Intel could put tens of HT pentium cores on a single chip, achieving the same result.
Yes, but any more than 16 logical cores, and your x86 arch won't recognize them. Why? 4 bit cpu identifiers (each logical core under HT identifies itself as a normal processor).

For computational problems that can be broken down into parallel computations, the answer is yes. For all the other types of problems, the answer is no. Although I have to admit that most algorithmic bottlenecks is in iterative tasks that are highly parallelizable.
Very true, but no more true for TRIPS than for any other parallel system. Additionally, just about every computer now does a lot of things in parallel. Think of any multitasking OS. So, worse comes to worse, you can run x number of apps as normal serial executions (though TRIPS wouldn't run any currently exsisting commercial software-- new platform and all, and a test too, not something ready for production by any means).

Unfortunately, it will mean new compilers and maybe programming languages that have primitives for expressing parallelism.
I completely agree.

Re:BS & hype (1)

tiohero (592208) | more than 10 years ago | (#6964419)

One of the more interesting processor designs would be the FORTH based 25xC18 using 25 C18 cpu cores which could achieve up to 60,000 (!!!) MIPS using a very low power design. The 25xC18 was designed by Chuck Moore. The interesting thing about the FORTH processors is that they use an extremely small instruction set (~24 instructions) and require only ~10K transistors per CPU allowing for very fast and low power operation. It also allows one to add on-chip DRAM right next to the core allowing 1ns memory access to a small cache. I think stack based processors are the way to go when it comes to multiprocessor designs since stacks allow easy pipelining of instructions and data between processors.

You can learn more about FORTH based processors here. [ultratechnology.com] Here's additional information about the 25xC18. [strangegizmo.com]

The FORTH based processors have never become mainstream, I'm not sure why that is? If this thing ever gets into production, it will be pretty revolutionary.

Re:BS & hype (1)

0rbit4l (669001) | more than 10 years ago | (#6964571)

At 32 nanometers, Intel could put tens of HT pentium cores on a single chip, achieving the same result.

No, they couldn't, because HT pentium cores use way too much power to be packed in at that density. This (and other similar) research is based on using many simple (but fast) low-power cores, usually in an adaptive fashion. (e.g., for one app I use certain processor cores for one portion of processing, for another I use them for something else entirely - and the mapping is usually done explicitly either by the programmer or the compiler.)

All the dynamic scheduling/out-of-order logic that modern processors use consumes massive amounts of power. Current trends in computer architecture are such that an HT pentium (or similar) processor will dissipate more heat per surface area than the surface of the sun (if you follow the roadmap guidelines for projected clock speeds). In other words, it's time to examine new architectures (which is exactly what this and many other projects are doing.)

How does this compare to VLIW? (2, Insightful)

binaryDigit (557647) | more than 10 years ago | (#6963011)

Forgive me if I'm off base here, but perhaps a proccie nerd can explain the differences between this design and say VLIW. They seem closely related, breaking the app into parallelizable chunks and sending them to n execution units. The article doesn't mention if the trips processing nodes can 'talk' to each other. If they can't, then this seems very similar in concept to vliw (if not different in physical and logical layout).

Check out (0)

Anonymous Coward | more than 10 years ago | (#6963031)

Grid Guru [gridguru.com] . It's a slash site devoted th Grid computing.

Can You Imagine... (-1, Troll)

Anonymous Coward | more than 10 years ago | (#6963040)

..a beowulf cluster of slashdot editors having sex with trolls?

read the comments from the horse's mouth (4, Informative)

Ristretto (79399) | more than 10 years ago | (#6963041)

This story already appeared [slashdot.org] , but was posted by someone who was not confused by the use of the term "grid"... Doug Burger, one of the two key profs on this project (and no relation!), answered lots of questions, which you can see here [slashdot.org] .

-- emery berger, dept. of cs, univ. of massachusetts

Grid posting (1)

Petronius (515525) | more than 10 years ago | (#6963715)

How long before Taco also posts it?

Why parallel processors aren't common (4, Interesting)

*weasel (174362) | more than 10 years ago | (#6963045)

... because nearly all programs are data-centric. parallelizing execution of code has an upper-bound with regards to increased efficiency, particularly when considering the increased overhead in memory management and control flow.

parallelizing the data-processing itself (Eg Seti@Home) whereby the data being worked on itself is spread amongst 'loosely parallel' execution units is much more practical, and doesn't suffer from the overhead involved in creating parallel processor servers, or even parallel execution chips. It also alleviates the memory bottlenecks of parallel execution cores.

I always wondered what kind of an app demands the kind of big iron that Cray and NEC churn out - that couldn't be more cost effectively realized through distributed processing amongst many independent computers (a la Google).

It seems, even cyclical, result-dependant processing (weather prediction) could be coded to work in such a manner.

1000 bare bones p4 3ghz PCs (~$600) have more processing power ( 2500 MFLOPS each ) than a single X1 cabinet ( 819 GFLOPS @ $2.5M ) and as you can see - for less than 1/4 of the cost.
( 2.55 TFLOPS @ $600,000 vs 819 GFLOPS @ $2.5M )
( p4 MFLOPS hit 5700 each w/ SSE2 )

Now I imagine there have to be exceptions. There -has- to be a reason to have such big iron for certain problems. There must be a reason that very smart people advise their superiors to buy up around $8b of this stuff each year.

but i don't personally see the applications, and given the monumental cost of developing a new processor nowadays - the market doesn't seem to either.

so that's my $0.02 as to why more complex esoteric parallel execution designed chips remain so rare.

Re:Why parallel processors aren't common (3, Insightful)

rockmuelle (575982) | more than 10 years ago | (#6963205)

Scientific and financial computing, especially modelling and simulation, are where parallel computers can make a difference.

Many of the approaches to these problems take the form of a grid of elements that have local and possibly non-local interactions with each other. Each processor gets a subset of the points to work with and has to communicate with the neighboring processor's memory space to get information about neighboring points.

In a cluster, handling the points at the edges (or any non-local effects) requires a network and possibly disk request. Compared to local memory, this is incredibly slow and can temporarily starve the processor.

Big iron parallel systems address this by giving more processors access to the same memory and other shared resources, avoiding the costly network requests.

Of course, the current super computers (ASCII *, etc) are all clusters, just with incredibly fast network connections.

-Chris

Cray's Ideas (1)

Adm1n (699849) | more than 10 years ago | (#6963348)

Cray said "Modify the Algebra to suit your Hardware, don't kill your processors with floating point operations, instead use a diffrent more efficent method". And Intel Listened. Crossbar N-Dimensional hypercubes are great for decreaseing the wall-clock time on complex problems, most noteably simulations. The worlds Largest super computer (NEC's Earth Sim in Japan) is used to do just that. The issue is the netowrk bandwidth or moving information form one cache to another. That is costly and intorduces a processing tax on parallel information. Myrinet's solution is ultrafast interconnects so your crossbar is disgustinly fast. And essentially you are still dealing with a Hypercube, then the issue becomes the distrobution of information into and out of the cube.

Re:Why parallel processors aren't common (1)

goombah99 (560566) | more than 10 years ago | (#6964645)

Most progamming languages (C, java) dont parallelize efficeintly and others written for parrallism are too special purpose to warrant attention. On the other hand there is good old fortran which will surprise a lot of people because its written to allow implicit multi-processing and avoid shared memory distribution and concurrency bottlenecks. see this post on slashdot [slashdot.org]

AKA Reconfigurable Computing (3, Informative)

yerdaddie (313155) | more than 10 years ago | (#6963054)

The ability to adapt the architecture for the workload, as discussed in this article, is something common to many different reconfigurable computing architectures [businessweek.com] like:
Quite a number of researchers are looking at the performance and density [berkeley.edu] adavantages of reconfigurable architectures in addition to the work mentioned in this article. What's really intriguing is considering how opreating systems could support reconfiguration [osnews.com] . Doesn't seem to be much work on the subject.

Wow, talk about a powerful compuer! (0)

Anonymous Coward | more than 10 years ago | (#6963056)

They can use it to think up even more lame, far-fetched acronyms.

Carly Fiorina (1)

codepunk (167897) | more than 10 years ago | (#6963072)

Carly said it is all hype and we cannot use grid's for at least another 5 years. Obviously Carly does not read slashdot or own a Linux box.

Re:Carly Fiorina (1)

Moggie68 (614870) | more than 10 years ago | (#6964132)

Another instance of someone just having to snipe at a person in high position just because the said person lacks a Y chromosome. And yes, I've got one.

And the projected date and time for (0)

BigGar' (411008) | more than 10 years ago | (#6963086)

skynet to become selfaware is now what?

Main memory bandwidth limits HPC today (2, Insightful)

ChrisRijk (1818) | more than 10 years ago | (#6963135)

If even with one CPU core, if your system is main memory bandwidth limited (or mostly), then extra cores won't help (much). So this kind of design looks good only for non bandwidth limited tasks, which is a much smaller market.

They don't seem to be considering business servers here, but they are more main memory latency limited than bandwidth limited, so multiple cores can help a lot. But you need more than simply lots of cores to have a good design. A critical thing to have is major software support which means using an existing ISA, not a new one.

So I'd expect this to be quite an obscure product in reality.

PS3? (1)

JAgostoni (685117) | more than 10 years ago | (#6963308)

Isn't this what the PS3 is supposed to use? Some sort of grid-like structure in their processor?

Re:PS3? (1)

Adm1n (699849) | more than 10 years ago | (#6963394)

Cell is a joint venture in vaporware by motorola and IBM, it's designed to be a "Massivelly Parallell Microprocessor" consisting of (last I read) 6 Dies on one core each to render a portion of your screen. Used in conjunciton with your broadband connection to render information. The problem bieng that brodband is horribly slow in terms of speed and throughput when compared with standard ethernet and high-speed crossbar architechures. Sony's CEO was quoted as saying that "Game developers wish to see a 1000 fold increase in compuational power and we will provide it in our next platform by using Grid-Computeing". I wonder what the Total Cost of Operation is on a Grid of PS3's?

Re:PS3? (1)

JAgostoni (685117) | more than 10 years ago | (#6963607)

Or the TCO of a Beo.... I won't do it, I just won't do it.

Re:PS3? (1)

Adm1n (699849) | more than 10 years ago | (#6964035)

Linkey Linkey [zive.net] Isin't google great, it's by far my favorite cluster to abuse.

The Parrallel Processor (1)

Serapth (643581) | more than 10 years ago | (#6963332)

In an age where clusters are becoming more prevalent for parallel computing I've often wondered where the parallel processor was. How about you?"

I may be thinking in different terms than you, but my understanding of future chip design is, that multiple CPU cores on one chip, is basically becoming the norm. To some extent, is this what hyper threading does on the newest Intel chips? I recall also reading the PPC G5 chip in the newer Mac's has multiple processor cores.

So, to answer... where are parallel processor's? Try best buy! ;-)

Re:The Parrallel Processor (4, Informative)

Adm1n (699849) | more than 10 years ago | (#6964084)

No no no.
Ok, HT double clocks the Cache! so you have two cache's for the price of one! The G5 is a multicore chip so is Cell Linky [zive.net] and The Opteron are all multicore chips, the diffrence (apart for the arch!) is the way VLIW's are feed to each of these. They are NOT paralell processors, paralellisam can be defined as the maintence of cache coherence, it is either inclusive (cray) or excluseive (rs6000), and requries a lot of bandwidth (local x-bar versus network). Where as parallel computers are not cache coherent and have a remote x-bar architechure, it all adds up to the same hypercube.

Re:The Parrallel Processor (1, Informative)

Anonymous Coward | more than 10 years ago | (#6964337)

G5 aren't multiple core CPU. However, IBM POWER4, which they are derived from, are dual core CPU.

did the memory bandwidth get scaled up as well? (0)

Anonymous Coward | more than 10 years ago | (#6963356)

big deal if i have 40 processers on a chip if i still can't get enough memory bandwidth to feed one of them.

The Connetion Machine (2, Insightful)

bluethundr (562578) | more than 10 years ago | (#6963399)

In an age where clusters are becoming more prevalent for parallel computing I've often wondered where the parallel processor was. How about you?"

Danny Hillis, the guy who founded ThinkingMachines designed a mchine called The Connection Machine [base.com] , (this story [svisions.com] has a cooler, more sci-fi lookin' pic of the old beastie [svisions.com] ) the central design philosophy was to achieve MASSIVE computing power through parallelism. It had 65,535 procs, each of lived on a wafer with dram thereon and a high bandwidth connection to up to (if I remember correctly) up to 4 other of the procs. Young sir Danny wrote a book on his exploits, [barnesandnoble.com] well worth checking out (seemingly, it's been calling to me from my bookshelf for about a year now).

And as someone pointed out, it seems we've seen this topic before. [slashdot.org] I'd have modded him up, [slashdot.org] (hint, hint) but I really like mentioning the connection machine where appropriate.

Die Yields (1)

dragonknight831 (597415) | more than 10 years ago | (#6963571)

I'm just curious what will happen to the yields when they are busy cramming more cores onto a single die. Already they have to discard or down-rate many of the die on each wafer. What will happen when you have several cores, any of which might be faulty and ruin the remainder of the die.

Re:Die Yields (2, Informative)

Adm1n (699849) | more than 10 years ago | (#6964162)

Die verifacation will be modified to accomidate the core level verifacation prior to multiple cores bieng used. Since you are layering dies on one another they will be verified individually, then as a whole if they do not add up as individuals then off to the scrap heap. But that all depends on the number of cores and process. Keep in mind that currently design sofware limits are around 20K layers of interconnects, so if a core is only 20 layers of interconnects (not uncommon) it's only 100 layers if its scrap and since it's vapor deposition the losses are neglegable (compareable to white noise or pennies on the hundred). Fab's spend more finding problems (and fixing them) than they do on materials. Yelds are much more prone to design flaws and external condition errors than failure due to a singular element (rmember the Pentium Floating Point error due to the capacitors not bieng sprayed at the right density?).

Deja Vu all over again (4, Interesting)

AlecC (512609) | more than 10 years ago | (#6963661)

This is very much not new. The basic idea has come and gone several times in the last twenty years, to my knowledge. Both SIMD and MIMD systems have been tried several timed. NCR even had one called the Grid, IIRC. Thinking machines (as seen on Jurassic Park I). The Inmos tranputer was designed for exactly this sort of connectivity. Intel had a development machine (?iWarp?) which tried to use it. And I am sure there were others that I don't recall. (As a user and fan of the transputer, I used to follow the field from a distance).

But the problem has always been the programming. Ordinary software does not map very well onto these architectures. Certain specific problems can be mapped well onto them, which results in spectacular performance claims for the system. But generally such systems perform well only on those problems for which they were specifically designed.

Communications is a common reason for failure. They scale very badly. In the early days of development, the first few processors have any-to-any connectivity, so the application will really fly. But since the connectivity rises as the square of the nuymber of processors, this cannot hold for very long. As soon as connectivity becomes limited, communications bottlenecks start to appear, and you get processors being held up either sending messages or waiting for them to arrive. Buffering (which many did not implement in their communications architectures) helps, but itm doesn't solve the problem. (A bit like lubrication - a small amount brings a considerable improvement in performance, but past a certain point, it only adds to costs).

Another problem is load balancing. It is very difficult to design your system so you don't end up with most of the CPUs waiting for one, overloaded, CPU to finish its job. The only architectures which really worked were the farm model - a central dispatcher sends tasks to a "farm" of identical "workers", which therefore request work units as and when they need them. This means that the whole code for the system has to be loaded into each worker; not necessarily a killer at todays memory prices, but it would be nice to be more efficient. It also requires the task to be divisible into a vey large number of chunks, which can executed independently without too much communications. OK for large volume simulations etc., but a disaster for (say) database programming, image/voice recognition.

It also doesn't help that not may people really think multi-threaded in their program design. Again, no-one that I know has a good Object Oriented multi-threading model. Current models are analagous to either pre-structured programming or early structured programming. Which means that people, reasonably, approach multi-threading as a dangerous monster to be approached only whan absolutely necessary, with great care, and if possible in flame-proof armour. For this sort of system to be much use we need a development which does to current threading what inheritance did to pre-OO languages: something that makes is so simple that, one over the hump of initial unfamiliarity, people use it all the time without even thinking about it.

I designed one of the larger heterogenous transputer based system to ship - up to 100 transputers in 6 different roles. Load and communications balancing was a real hassle from the the day the system first started to work for real, and we were constantly tuning buffers, fiddling with routing algoirithms, movong bits or processing from this CPU to that to get the perfomance up. (Not to mention that inmos completely blew their second generation transputer, which we had been hoping would solve many of our problems).

Re:Deja Vu all over again (0)

Anonymous Coward | more than 10 years ago | (#6964123)

But the problem has always been the programming. Ordinary software does not map very well onto these architectures. Certain specific problems can be mapped well onto them, which results in spectacular performance claims for the system.

I agree with your post. but isn't the issue of programming just the normal learning curve for a new model of software design? Problems that can map well into distributed and/or parallel architecture will get huge performance improvement, but could it be that many more engineering problems can use distributed/parallel processing? Just because programmers are using procedural techniques to solve a problem, it doesn't mean the problem is procedural in nature.

Re:Deja Vu all over again (1)

Adm1n (699849) | more than 10 years ago | (#6964331)

Well why not use the systems and algorithims in place in commerical clusters like Mosix and OpenMosix as models and do simulations using current traffic queing theorey prior to any hardware design? Moshe Bar devloped OpenMosix as a cluster that operates by farming threads to the most available machine and the code is free. Using that model one can then develop a decent hardware platform that would be able to accomidate it's own monitroing system as part of the communications infrastructre.

Read the Article (2, Insightful)

EnglishTim (9662) | more than 10 years ago | (#6964687)

Read the article - this isn't the case that you've got a whole bunch of traditional processors and you try and divide the work between them. They're talking about the CPU itself being split into several smaller general units, so that each instruction gets excecuted by several of these units. The instructions are grouped together and then sent to the CPU in blocks. All the work for that block is then split between the units, taking into account any interdependencies. I suppose the closest thing to it would be to have microcode being executed in parallel.

Re:Read the Article (1)

AlecC (512609) | more than 10 years ago | (#6964946)

From the article: The prototypes will include four Trips processors, each containing 16 execution units laid out in a 4 x 4 grid. By the end of the decade, when 32-nanometer process technology is available, the goal is to have tens of processing units on a single die, delivering more than 1 trillion operations per second.

You've got a whole bunch of non-traditional processors and you try and divide the work between them.

The individual CPUs are, as you say, more flexible than current CPUs. Like hyperthreading and deep pipelining, that will bring a one-off performance improvement of perhaps, two, three or four times. Not to be sneered at - but most systems can get a four times improvement by the brute force method of throwing, say, ten processors at it. In the small number of processors range, overheads can be low enough that a multi-cpu system only loses a large fraction of its processing power on overheads. Given the economies of scale, it seems to me that it is cheaper to use off-the-shelf components, rather than do a lot of expensive development of a new chip - which you will probably have to manufacture using a process a cople of years older (and therefore slower) than the laytest, greates ptocess intel are using for the newest top-of-the-line CPU.

Parallelism won't do much unless we get tens of parallel units working at the same time.

project home page (2, Informative)

the quick brown fox (681969) | more than 10 years ago | (#6963799)

project home page [utexas.edu]

They have some papers available there...

Doug Burger is wicked cool (0)

Anonymous Coward | more than 10 years ago | (#6963940)

I went to UT and had the privelege of taking computer architecture with Doug Burger [utexas.edu] about 4 years ago, one of the co P.I.s of the project. He used to give us rock hard exams ( but buy us pizza to ensure nobody felt bummed out because of them ). Also used to give tons of homework. However, he never came across as evil, just as someone who genuinely wanted us to learn. He's incredibly smart, knows what he is going and is good natured to boot. Check out some of the fun quotes in his plan [utexas.edu] I went to many lectures of his about his grid architecture project. At that time he had many graduate students working on it - some working on a simulator for the processor, some working on a compiler and some working on studies of what applications are going to take advantage of the new architecture. I'm sure this project is going to turn out to be really cool.

I'm sorry. (1)

fshalor (133678) | more than 10 years ago | (#6964138)

I read this as "Girl Processing" and was preparing a WTF? comment.

Internal parsing error reported. :)

I always thought the "single" processor paradigm has gone on way too long. I guess soon we'll be able to plug in multiple processors like we do ram.. But a question.
(1/req)=(1/R1)+(1/r2)+(1/R3) ... Wouldn't it make more sence to run them in series? :) (jk)

The Grid... (1)

hey (83763) | more than 10 years ago | (#6964197)

... As of August 14 seems like a terrific idea for electric power. Let's copy that fine idea of dependency on the perfect working for a system several states away.

karma whoring. (1)

*SECADM (223955) | more than 10 years ago | (#6964696)

Since the article doesn't really have to do with grid computing. Here are some real Grid Computing links.

Globus Toolkit [globus.org]
LSF [platform.com]
openPBS [openpbs.org]
gridengine [sunsource.net]
OSCAR [sourceforge.net]
ROCK MPP [rocklinux.net]
maui [supercluster.org]

and last but not least: beowulf cluster [beowulf.org]
---
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>