Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

A Glimpse Inside the Cell Processor

Zonk posted more than 8 years ago | from the ahh-scary-close-it-up dept.

66

XenoPhage writes "Gamasutra has up an article by Jim Turley about the design of the Cell processor, the main processor of the upcoming Playstation 3. It gives a decent overview of the structure of the cell processor itself, including the CBE, PPE, and SPE units." From the article: "Remember your first time? Programming a processor, that is. It must have seemed both exciting and challenging. You ain't seen nothing yet. Even garden-variety microprocessors present plenty of challenges to an experienced programmer or development team. Now imagine programming nine different processors all at once, from a single source-code stream, and making them all cooperate. When it works, it works amazingly well. But making it work is the trick."

cancel ×

66 comments

Sorry! There are no comments related to the filter you selected.

eh (-1, Offtopic)

GigsVT (208848) | more than 8 years ago | (#15720270)

Somehow I think readers of a site named "Gamasutra" have probably never had a first time. I bet they have never programmed anything either.

Console gamers get consoles because they can't deal with installing video card drivers. Did the author really thing they were programmers, much less compiler programmers?

Re:eh (1)

Knos (30446) | more than 8 years ago | (#15720309)

Gamasutra is not targetting gamers. It's a site for gaming industry members.

Re:eh (0)

Anonymous Coward | more than 8 years ago | (#15720557)

"Gamasutra is not targetting gamers. It's a site for gaming industry members."

Duh! That's why it's in the DEVELOPER section and not in the GAME section.

Re:eh (4, Insightful)

Zediker (885207) | more than 8 years ago | (#15720481)

"Console gamers get consoles because they can't deal with installing video card drivers."

Nope, console gamers buy consoles because they offer games that dont appear on the PC and/or dont have the money to buy a pc gaming rig. $1200+ (im talking building from the ground up with reliable and decent parts) to just start getting a decent computer together usualy isnt as justifiable as spending ($100:GC, $130:DS, $150:PS2/Xbox, $200:PSP, $400:360) for a console of some sort.

Re:eh (1)

not already in use (972294) | more than 8 years ago | (#15720645)

Given the choice to buy a game for a console or a PC, given I had both, I would go with the console every time. Case in point, Xbox is essentially a 733mhz PC. Get Halo 2 to run on your 733mhz Desktop w/ 256 ram. Dedicated gaming hardware will outperform a PC any day of the week.

Re:eh (1)

PitaBred (632671) | more than 8 years ago | (#15720829)

But then you have to play Halo 2 with a shitty control setup (gamepad).
I'll pay a premium to play games comfortably, thank you.

Re:eh (1)

HiredMan (5546) | more than 8 years ago | (#15720720)

dont have the money to buy a pc gaming rig. $1200+ (im talking building from the ground up with reliable and decent parts)

The cost of a gaming rig isn't just in the building - it's also in the maintaining of a PC gaming rig. To continue to play the newest FPS you constantly need to upgrade processors and especially GPUs. Over a longer period you will probably have to replace your whole mobo to be able to use the new processor socket or chipset and get the new AGP4x, AGP8x, PCIe slot advancement. Over that same 3-5 year period your console doesn't cost you a thing.

=tkk

Re:eh (1)

MaWeiTao (908546) | more than 8 years ago | (#15720763)

The only reason you'd need to upgrade your video card that often would be to play the latest games in full detail at the highest framerates.

If you're willing to compromise a video card will hold out fairly well for quite a bit longer, and you'll probably still see better graphics than you'd get on any console.

Nevertheless, a PC is significantly more expensive than a console, although the PS3 is doing a good job of changing that. The point is that a PC does far more than any console will ever do. And if all you're going to do with a PC is play games then you probably really are better off just buying a console. Needless to say, most people don't just use PCs for games.

Re:eh (0)

Anonymous Coward | more than 8 years ago | (#15720940)

PC hardware isn't that expensive if you don't buy new stuff. Old stuff holds its own rather well. I just got rid of my old (circa 2001) P4 system, and replaced it with a D805, new MB, 1 GB of DDR400, and a AGP 8x Radeon x800 GTO. Total cost was around $430 with shipping, and it will play anything I've tried on it (up to Prey, F.E.A.R. doesn't look interesting) at 1280x1024 with all possible effects on at a smooth frame rate. Granted, this wasn't intended as a long-term overhaul, but I'm so impressed with the performance that I think I'm gonna hold onto it for at least another 2 years.

Re:eh (0)

Anonymous Coward | more than 8 years ago | (#15720626)

Gamasutra is a site produced by the people who make Game Developer's Magazine and who also host the GDC.

Gamasutra == Best game development website.

Re:eh (0)

Anonymous Coward | more than 8 years ago | (#15720766)

yhbt,yhl,hand! roffle!

Re:eh (1)

forkazoo (138186) | more than 8 years ago | (#15721028)

Console gamers get consoles because they can't deal with installing video card drivers. Did the author really thing they were programmers, much less compiler programmers?

Considering that Gamasutra is the website for Game Developer Magazine... Yes, I think the author really did expect that an appreciable percentage of his readers would be programmers.

Oh yeah, I remember my first time (4, Funny)

llamalicious (448215) | more than 8 years ago | (#15720272)

I was 17 and she was 26 and ... oh shit, wrong first time.

Re:Oh yeah, I remember my first time (5, Funny)

neonprimetime (528653) | more than 8 years ago | (#15720318)

I was 17 and she was 26 and

Per chance, did you have a MySpace account, and your parents didn't know about your little shin-dig?

Re:Oh yeah, I remember my first time (4, Funny)

Rakshasa Taisab (244699) | more than 8 years ago | (#15720674)

I'm not familiar with any 26 processors, surely you meant 286?

Re:Oh yeah, I remember my first time (2, Funny)

goodenoughnickname (874664) | more than 8 years ago | (#15720970)

He had sex with a 286-year old?! What was she, a wookie?

Re:Oh yeah, I remember my first time (0)

Anonymous Coward | more than 8 years ago | (#15723575)

Yeah, but she shaved...you know...down there..

Re:Oh yeah, I remember my first time (1)

jd (1658) | more than 8 years ago | (#15720777)

Cache, not cash.

Re:Oh yeah, I remember my first time (1)

linguae (763922) | more than 8 years ago | (#15721136)

She must have taught you how to use Unix or OS X for the first time, since that is the only believeable answer. (This is Slashdot, after all).

Sacrificing karma (-1, Offtopic)

utopianfiat (774016) | more than 8 years ago | (#15720352)

Can anyone give me "In The Ghetto"?

The article's author is huffing crack here... (1)

argent (18001) | more than 8 years ago | (#15720411)

There aren't many businesses where manufacturing technology exceeds design technology. Throughout human history we've been able to dream up things we can't yet build, like spaceships, skyscrapers, jet packs, underwater breathing apparatus, or portable computers. But in the semiconductor business the situation is reversed: chip makers can build bigger and more complicated chips than they can design. Manufacturing prowess exceeds design capability. We can fabricate more transistors than we know what to do with.

Not only can I dream up things to do with four million transistors, but there's always plenty of EASY and productive uses for more transistors. More cache, to begin with... when you can still buy chips today with only half a meg of L1 cache you know theres plenty of headroom there. Multiple cores? The aborted EV8 Alpha would have gone up to 4 regular cores per chip, and it was killed by boardroom shenanigans between Intel and HP and Compaq... not technical or business reasons.

And that's just stuff that you could build right now, without designing anything new, if you had the transistor budget. Moving on to more speculative designs... nobody's brought GPUs into the "Tron" era yet... where's the massively parallel raytracing GPUs with tens of thousands of relatively simple cores each rendering a postage-stamp size piece of the scene in photorealistic quality in realtime? There's all kinds of embarassingly parallelizable problems this kind of thing could be applied to, rendering is only the most obvious one...

Re:The article's author is huffing crack here... (0)

Anonymous Coward | more than 8 years ago | (#15720811)

Yes, well, except that its hard to build a system that can actually keep all those transistors doing something useful. In real life, there are damned few "embarassingly parallelizable problems". Very few Linux and Windows apps are parallelizable, much less parallelized. The upshot: keeping multiple parallel cpus busy is hard work for the programmer, and throwing more transistors into a mega-gaint super-cpu will *not* make that cpu go faster.

Re:The article's author is huffing crack here... (1)

Hamled (742266) | more than 8 years ago | (#15720837)

I think the article's point was that once you get more and more transistors on there it becomes very difficult to design things to not end up overheating all the time and not use up insane amounts of power, not to mention just becoming extremely complex like x86 cores today.

Thusly, you're right, the parallelization is the answer (atleast according to the Cell design philosophy). Because it's possible to put so many transistors on there, the way to do it without running into as many problems would be to create a large number of parallel, simpler cores. This, of course, works best if what you are parallelizing is well suited to it, for example vector processing which Cell is built to do.

So basically the article was reiterating the idea that while we continue to be able to build transistors at smaller size, and fit more on a chip, continuing in the "old" way (or atleast in the way that x86 processors are designed) is/was leading to problems in terms of design: runs too hot, uses too much power, and really complicated designs for things like out of order execution etc. This I assume has been pretty widely known for a few years atleast, since both Intel and AMD have moved towards multiple cores.

Re:The article's author is huffing crack here... (3, Insightful)

argent (18001) | more than 8 years ago | (#15721346)

I think the article's point was that once you get more and more transistors on there it becomes very difficult to design things to not end up overheating all the time and not use up insane amounts of power, not to mention just becoming extremely complex like x86 cores today.

I wasn't talking so much about the article as a whole, but the insane levels of hyperbole in the particular paragraph I quoted. "We're capable of putting more transistors on a chip than we can think of things to do with". That's not even vaguely true.

More transistors == more power, all else being equal, because it's all those junctions flipping state so quickly that uses the power.

As for the insanity if Intel's processors... that seems to be a perversion particular to Intel. In the past three decades that I've been following the industry, Intel has only managed to produce *one* sane CPU design, the i960, and they promptly caponised it by removing the MMU and relegating it to embedded controls lest it outcompete their cash cow.

The rest... from the 4004 through the 8080, the 8086 and its many descendants, iApx432, i860, and Itanium... have been consistently outperformed by chips with smaller transistor budgets built by companies with far fewer resources. They only occasionally broke past the midrange of the RISC chips, and were usually trailing back with the anemic Sparc. Where they have excelled has been marketing and in the breadth of their support... both hardware and business. IBM went with the 8088 because they could get them in quantity and they could get good cheap support chips for them: if you went with Motorola or Zilog or Western Digital or National Semiconductor you pretty much had to go back to Intel to build the rest of your computer anyway.

Re:The article's author is huffing crack here... (1)

cnettel (836611) | more than 8 years ago | (#15720855)

Except, of course, that ray tracing is not easily parallelizable as you need a significant amount of data to each of those postage stamp size pieces (hey, that's one of the reasons that "just" rendering triangles is so much easier, you take a global problem and make it local). Wiring all those transistors would be hard. Adding cache and cores is also, to some degree, the solution when you are out of ideas. It will make things better, but it's a quite expensive way to get the scaling (especially cache).

Re:The article's author is huffing crack here... (2, Insightful)

argent (18001) | more than 8 years ago | (#15721245)

Except, of course, that ray tracing is not easily parallelizable as you need a significant amount of data to each of those postage stamp size pieces

The mesh is common to all the processors, and not that big, it can be broadcast. Textures are the big chunk, but most pieces will only need high resolution versions of the textures in their direct view... unless a processor is looking at an optically interesting surface (for reflections or refractions) it can get by with mesh-resolution approximations to the textures outside its part of the scene.

This requires new technology, yes. You need mesh caches shared among not-too-many processors, and techniques to broadcast the mesh to the mesh cache efficiently, and a front-end to apportion the space to the processors and parcel out textures, maybe even go to a finer subdivision for "interesting" areas. But raytracing is practically the poster boy for "embarassingly parallelizable" applications.

Adding cache and cores is also, to some degree, the solution when you are out of ideas.

Not to that great a degree, and we've really only scraped the surface with what we'll be doing with multi-core. DEC laid down a long term plan for the alpha in the early '90s and multi-core was planned for the early '00s right from the start. Compaqtion and having Intel pull a fast one on HP wasn't in their plans, but 4 or 8 cores and enough cache to keep them fed is just the next step.

Another thing we're going to see, particularly for laptops, is super-integrated chipsets. Freescale's e600 would have been the next step for Apple if they'd been faster getting it to market (or if Apple had been less reluctant to break the G4 bus compatibility and they'd gotten started sooner), and it seems to me that adding the GPU in as well makes a lot of sense. Expect to see Intel CPUs with GMAxxx (or their descendants) on-chip, and AMD cutting deals with nVidia and ATI.

Re:The article's author is huffing crack here... (1)

drinkypoo (153816) | more than 8 years ago | (#15722354)

But raytracing is practically the poster boy for "embarassingly parallelizable" applications.

You neglected to mention the primary reason this is true; you don't have to do anything fancy, because it's fairly rare that we even need to parallelize rendering a single frame these days - most rendering involves big bulk numbers of frames which are later assembled into a video. You can always send individual frames to clients. Thus you can parallelize it without even doing anything hard.

memory speed? (-1, Redundant)

Janek Kozicki (722688) | more than 8 years ago | (#15720427)

When it works, it works amazingly well. But making it work is the trick.
Yes, especially when the local memory read speed is 16MB/sec [slashdot.org] (no this is not a typo) [theinquirer.net] .

Re:memory speed? (1, Informative)

Janek Kozicki (722688) | more than 8 years ago | (#15720469)

oops, sorry. It was taken out of context, and false. That "local memory" was the memory of graphic card, which is rarely read from (but often written), and that "main memory" (with nice speed 16GB/sec) was actually the system RAM.

Re:memory speed? (1)

TheFlamingoKing (603674) | more than 8 years ago | (#15721426)

Wait, what?

You spread a bunch of FUD about the PS3 and get +1 informative.
Then you correct your FUD, and also get +1 informative.

Gotta love Slashdot...

Re:memory speed? (1)

not already in use (972294) | more than 8 years ago | (#15720471)

If I remember correctly, a big deal was made about this a while back and it was put to rest. But thank you, I was worried a Sony story may go by without any sensationalism.

Re:memory speed? (5, Informative)

Space cowboy (13680) | more than 8 years ago | (#15720542)

You are misinformed.

This is the speed at which the Cell can read RSX's local memory. Memory bandwidth for the Cell itself is ~25 GB/sec. If the Cell ever wants to access the private RAM of the RSX (why ?) it *is* possible, but it's a lot more efficient to use the normal pathway through main memory...

Simon.

Re:memory speed? (1)

makapuf (412290) | more than 8 years ago | (#15720574)

And when you read some comments on this story, you understand that it's reading from video memory, which is of very little use. Why would you want to read from video mem ?
I don't remember what read speed AGP had, but it was certainly asymetric wrt writing.

Re:memory speed? (1, Informative)

Anonymous Coward | more than 8 years ago | (#15720685)

It's called "picking". You render your objects using only color (no texture data); one color per object. Then you can read the color that lies directly under the mouse cursor. A simple lookup will tell you which object is under the cursor. It's only efficient in terms of implementation time. I'm sure most game development houses have very nice ray casting functions built into the engines they use, so you are right about it being of very little use. Just thought I'd answer your question.

Re:memory speed? (0)

Anonymous Coward | more than 8 years ago | (#15721133)

Yeah, because lots of CPU cycles on Playstations are spent on mouse hit-testing.

Re:memory speed? (1)

Ant P. (974313) | more than 8 years ago | (#15720839)

AGP is 6MB/s read, IIRC. I'm probably wrong, but the number was definitely less than a PCI bus.

Re:memory speed? (1)

Mr Z (6791) | more than 8 years ago | (#15722456)

It wasn't the frame buffer, but, as others have pointed out, the memory local to the RSX. Definitely a non-concern.

Debunked in same article (1)

SuperKendall (25149) | more than 8 years ago | (#15720787)

I guess you forgot to read the comments [slashdot.org] in your own link!!

25GB/sec, not 16MB/sec.

MOD PARENT TROLL (1, Interesting)

Anonymous Coward | more than 8 years ago | (#15720874)

That "news" was thoroughly debunked as anti-Sony propaganda. There is almost no reason to read from the GPU's local memory from the Cell's SPEs or PPE. If you do have a legitimate reason, to do so that requires high memory bandwidth, your design is wrong. The GPU can read/write to its memory at blazing fast speeds, and talk directly to the SPEs and PPE at very high bandwidth as well. Any use of an SPE or the PPE to read directly from the GPU's local memory is a case of insane coupling between components and as we all should know is indicative of a bad design.

Re:memory speed? (1)

be-fan (61476) | more than 8 years ago | (#15722511)

Out of context. The slide was in a presentation about the RSX. "Local memory" here is the memory local to the RSX, not the memory local to the SPE. The slide shows that its slow to read GPU memory from the CPU --- you should have the GPU upload to main memory instead.

Sega Saturn Redux? (4, Interesting)

ToxikFetus (925966) | more than 8 years ago | (#15720523)

As TFA mentioned, this has the potential of becoming another Sega Saturn boondoggle. Will the developers learn how to fully utilize this incredibly complex architecture? Relying on the "octopiler" to efficiently map to the Cell architecture seems a bit optimistic and naive.

Re:Sega Saturn Redux? (1)

CompSciStud4U (877987) | more than 8 years ago | (#15720834)

Relying on the "octopiler" to efficiently map to the Cell architecture seems a bit optimistic and naive.

Currently the "octopiler" sorta-kinda works based on previous reports. It sure doesn't come anywhere near close to using the full potential of the processor. I went to a talk at IBM about the cell, but had to sign the standard non-discloure agreement. All I'll say is that the people writing the compiler for this wish they had some input in the design of the processor.

Re:Sega Saturn Redux? (3, Interesting)

SSCGWLB (956147) | more than 8 years ago | (#15720841)

I seriously doubt they will write efficient programs in the lifetime of this console. The level of efficiency they will achieve depends on a lot of things. I didn't see it in TFA, but I am assuming you cannot treat each SPE as an individual processor.

First of all, their dream of a general 'octopiler' is pure fantasy. I have written massively parallel MPI and Shared Memory applications and can testify to their complexity. Mapping an arbitrary piece of code transparently to multiple processor is a extremely difficult task. If the source is carefully written, it is possible to parallelize certain sections. This requires careful forethought and detailed knowledge of how the compiler works. If I where to guess, I would say they would use some type of middleware (a la CORBA) or libraries (a la MPI) to extend a programming language. That way, the programmer could specify sections of code that can be executed in parallel. This would help the compiler immensely and make much more efficient code. It would be really cool if the SPEs had some type of identifier, allowing you to task specific SPEs! I haven't read much about the CELL, so this may or may not be possible.

Overall, I bet the vast majority of the parallel code will be in carefully crafted libraries of CPU intensive tasks. These libraries will grow over time, making utilization of the SPEs more and more efficient. Until then, the main CPU and one SPE will execute the majority of the game with occasional help from the other SPEs.

~nate

Re:Sega Saturn Redux? (2, Interesting)

Phil Wilkins (5921) | more than 8 years ago | (#15721083)

I am assuming you cannot treat each SPE as an individual processor.

Your assumption would be wrong.

Re:Sega Saturn Redux? (2, Interesting)

SSCGWLB (956147) | more than 8 years ago | (#15721352)

Thanks for the condescending and uninformative remark. What I was not sure of was if the OS treated each SPE as a separate, autonomous core (i.e. SMP). I had assumed the context of my question made that clear. As it turns out, my assumption was correct.

"The PPE which is capable of running a conventional operating system has control over the SPEs and can start, stop, interrupt and schedule processes running on the SPEs. To this end the PPE has additional instructions relating to control of the SPEs. Despite having Turing complete architectures the SPEs are not fully autonomous and require the PPE to initiate them before they can do any useful work." Courtesy of the cell wiki [wikipedia.org]

In other words, the OS tasks the PPE which tasks the SPEs. This is a entirely different beast from 8 autonomous cores.

I also found an interesting article [blachford.info] about programming the cell. Not all my assumptions survived *sigh*. Thanks!

~nate

Re:Sega Saturn Redux? (1)

Phil Wilkins (5921) | more than 8 years ago | (#15722188)

Thanks for the condescending and uninformative remark.

My pleasure. ;p

Yeah, the PPE has to kickstart an SPE, but after that, you can treat the SPE as totally autonomous. They can fetch their own code and data, and what more do you need than that? You don't have to, you can manage them pretty much any way you want to. The PPE can halt an SPE, but that's a really inefficient way of doing things. Think of the size of the context you'd have to swap out to have the PPE control the threading on the SPEs.

Also I'd be wary of making assumptions as to how well Cell is being utilised.

Re:Sega Saturn Redux? (1)

tricorn (199664) | more than 8 years ago | (#15722999)

The Cyber architecture [wikipedia.org] had typically two main CPUs (60-bit), and 12-20 "Peripheral Processing Units", which were much lower capacity, 12-bit processors. The CPUs were started and stopped by the PPUs, and had no interrupt architecture. Control of the system was actually in the PPUs, they loaded programs into memory, set up memory mapping, handled context switches and system requests. PPUs themselves were implemented as shared hardware with multiple contexts, and control actually changed after each instruction to the next PPU (which lead to certain deterministic behavior which was sometimes exploited to bypass the need for interlocks in certain cases). In later versions, there was a CPU instruction "Exchange Jump", which saved the current program context and loaded the system monitor context, which could start up the next process that was ready, but that was used mostly for requesting immediate processing of a system request (normally you stuck a request in your (relocated) memory location 1, then waited for a bit to change in that word to indicate the request had been processed. Most system requests themselves also had provision for asynchronous processing of requests; you didn't have to wait until it was complete, you just continued on and checked later to see if it was done. System requests were picked up by one of the PPUs scanning each process (if you didn't do an exchange jump).

Although their primary purpose was to control the details of running the CPU, and (as evidenced by the name) control peripheral devices, they could also be used to do additional processing in parallel with the main CPU. Remember, the main CPU was faster, and instruction execution time was on the order of a microsecond; given that these processors are on the order of a 1000 times faster, I think that all you need to get good performance out of it is good bit-banging programmers who aren't afraid of getting down and dirty with the hardware. Lunar lander on a vector graphics display was pretty amazing.

Re:Sega Saturn Redux? (1)

be-fan (61476) | more than 8 years ago | (#15722491)

They are atonomous cores. Indeed, the best analogy for them is a node in a network. They've got their own non-coherent local memory, and are connected via a ring bus.

The programming model for the SPEs is fairly straightforward. You bundle some code and some data into an APUlet, and upload it via the ring bus to the SPE. The SPE runs that code for some amount of time, and can communicate with the rest of of the chip either by sending messages over the ringbus (using a mailbox mechanism), or doing DMAs.

Re:Sega Saturn Redux? (3, Interesting)

jd (1658) | more than 8 years ago | (#15720984)

Not sure it's that complex. If anything, it sounds rather limiting. Eight isolated physical coprocessors, each supporting two threads? Why not have one coprocessor that supports 16 threads that maps onto as many virtual coprocessors as desired? Basically the same circuitry, but can dynamically remap to the problem being solved, as opposed to remapping the problem to the circuits provided.


(Having the computer model itself to the problem reduces the complexity of programming and will make optimal use of the hardware. Having the program model itself after what the computer is tuned to do is merely an ugly hack and requires ugly compilers to specifically translate between the paradigms.)


The cell processor is designed around 1980s concepts of load-balancing while keeping to many of the rules of second-generation programming. Technology has moved on. That's not to say the cell is bad. It's a definite improvement over the 1960s concepts used in many modern CPUs. However, it is still 20 years behind the curve. C'mon, guys, this isn't the Space Shuttle, it's a microprocessor. There is no excuse for network and design technology to be so far beyond the best of the best that industrial giants are capable of doing.


Actually, it's worse than that. Modern multi-processor systems require specially-designed chipsets and become exponentially more expensive as you build them up. Single boards don't usually go beyond 16 processors. In comparison, people built single boards with 1024 Transputers without difficulty, with costs increasing linearly. So, in multi-processor architectures, we can't even match everything that could be done in the 1980s.


How does this affect those using the Cell? Well, that's simple. It doesn't offer enough of an added advantage and is different enough that coders will have difficulty making good use of it. That means that coders will have to be inefficient OR dedicated to that one chip, which has no guarantee of making any money for them. Coders won't bother, unless there is something out there that will make it a guaranteed success. I'm not seeing this killer demo.

Re:Sega Saturn Redux? (1)

Relic of the Future (118669) | more than 8 years ago | (#15721292)

"Basically the same circuitry."

Functionally? Maybe. But considering the 20% yields, would you rather lose 1/8th of the chip, or the whole thing? Also, I imagine managing the cache for that on the fly would be a significantly larger headache then dividing it up in this more consistant way; associative lookup can take up a lot of realestate real quick.

Re:Sega Saturn Redux? (1)

jd (1658) | more than 8 years ago | (#15721914)

Virtually the only thing I have heard from Sir Clive Sinclair that made me stop and think "that is so utterly the way to do it" was when he proposed wafer-scale architecture where, instead of relying on everything working, you design it with the idea that some components will be bad at the start, and others will fail when in use. He did this by proposing that the selection of which element to use support the notion of bad elements which the selection hardware (or software) simply ignores.


If you did this with the cell architecture, you'd get many chips with fewer than 8 coprocessors, but I'd expect the yields to be closer to 60-70% at some level of function. Why sell bad chips? Because not everyone needs that level of compute power, even for games. Because the demonstration of graceful fault tolerence will make the chip very appealing to the military and space industries. And because using bad chips is standard practice when building cheap systems.


Oh yeah, caching would get really fun. Sixteen physical streams hooked together as eight physical processors hooked together as a single virtual processor, hooked to a common cache. You're definitely talking about a heavyweight caching system, especially as the dropping algorithm can't rely on presumed behaviour of a single processor. I'd have to think about that one. Finding anything in the cache would be equally hard.


My excuse there is that evolutionary computing is slow. Revolutionary designs can do in a single year what evolutionary methods can do in a decade. If you're going to invest oodles of cash into a design that alien, you might as well make it truly revolutionary. Otherwise, it won't have enough of an advantage. A slight edge in a market simply isn't going to last long enough or sell enough to cover the costs.

Re:Sega Saturn Redux? (1)

drinkypoo (153816) | more than 8 years ago | (#15722365)

Modern multi-processor systems require specially-designed chipsets and become exponentially more expensive as you build them up.

Unless, of course, you're using an AMD processor, which has Hypertransport links, and become linearly more expensive as you build them up. Give or take.

In order to get the best performance out of hammer and HT you have to link the processors more than in a line, but since it's a NUMA system you can simply link them end to end. It will not be an efficient architecture for most types of problems, but it definitely could work for some other kinds.

Re:Sega Saturn Redux? (1)

be-fan (61476) | more than 8 years ago | (#15722469)

[QUOTE]Why not have one coprocessor that supports 16 threads that maps onto as many virtual coprocessors as desired? Basically the same circuitry, but can dynamically remap to the problem being solved, as opposed to remapping the problem to the circuits provided.[/QUOTE]

It's not the same thing *at all*. CPUs are highly non-linear. 8 2-way processors are much simpler than 1 16-way processor. CPU structures tend to scale with the square of their width. A front-end capable of issuing 32-instructions per cycle would be gigantic. Heck, just consider the amount of register state in your 16-way processor. 8 SPEs x 128 registers/SPE = 2048 registers. To clock that register file at 3.2 GHz on a 90nm foundry process, you'd have like a dozen pipeline stages devoted to register read.

Re:Sega Saturn Redux? (0)

Anonymous Coward | more than 8 years ago | (#15725019)

8 SPEs x 128 registers/SPE = 2048 registers.



How'd you get that result?

Re:Sega Saturn Redux? (1)

be-fan (61476) | more than 8 years ago | (#15726118)

Because I'm dumb. I calculated the RHS using the original poster's mistaken notion that there were two-threads per SPE, and the LHS using the actual SPE configuration.

Re:Sega Saturn Redux? (1)

be-fan (61476) | more than 8 years ago | (#15722439)

It's not really an "incredibly complex" architecture. It's different, but in practice, it's probably less complex in practice than symmetric multithreading. If you're programming specifically for Cell, it should be fairly straightforward to create pieces of code that you can run on the SPEs, while doing control logic on the PPE. What will be difficult will be porting existing code to the new architecture.

PS) Yes, I am a programmer. I think many discussions of Cell take it for granted that multithreaded programming is also very difficult, its just that most PC game developers are already very familiar with it. Cell has different parallelism models --- the SPEs can be organized in a producer/consumer arrangement, or in a pipeline, etc. An MPI programmer will probably feel right at home, though he might wish for more local memory in the SPEs. Myself, if given the choice between shared memory threading and message passing, I'd take the latter in a heartbeat.

Saturn was less well planned (2, Interesting)

Nazmun (590998) | more than 8 years ago | (#15722564)

It was essentially an uber 2d platform with a 3dchip added in the last minute. The cell, rsx, and memory type were conceived a long time ago to work together. Neither the cell nor the graphics chip is a last minute addon to compete with a brand new foe (as psx was with it's new 3d capability).

Also sony is hard at work at dev kits which will make programming with the cell much easier. How well they succeed in making these dev kits will be the primary factor in how programming for the beast goes.

True, kind of (1)

aliquis (678370) | more than 8 years ago | (#15723287)

No, only the gyro is a last minute addon to compete with a brand new foe ;)

So what's new? (1)

Crussy (954015) | more than 8 years ago | (#15720588)

This article might not be an exact dupe, but this same information has been posted countless times already. 90% of it is even readable at cell's wikipedia article [wikipedia.org] . I don't think anything more about cell is news worthy until someone actually does something this the processor...

Not NINE processors, only EIGHT, since... (4, Interesting)

Harry Balls (799916) | more than 8 years ago | (#15720605)

...on the average, one of the slave processors is non-functional.
Read more about the yield problems of the Cell chip here:
http://theinquirer.net/default.aspx?article=32978/ [theinquirer.net]

Fabrication yield is estimated at only 10% to 20%, which is very low for the industry.

Re:Not NINE processors, only EIGHT, since... (2, Interesting)

Anonymous Coward | more than 8 years ago | (#15721255)

Fabrication yield is estimated at only 10% to 20%

That's for a completely working package, the cell plus 8 SPEs. Because of the low yield of the "perfect" processors, PS3 will be using the ones with 7 working SPEs, since there are plenty of those. The IBM discussion linked by the inquirer shows that.

Yield is so low due not only to the complexity but also the size, if there are an average of 10 defects on a wafer and you can only fit 10 processors on a wafer (these numbers pulled totally out of my ass) then you're basically hoping that those 10 defects won't be spread out evenly. if you can fit 1000 processors on a wafer, those 10 defects can kill 10 processors, and you're still doing just fine. Of course, as we go down in process size, more things that didnt matter before can become defects. When you're working under 100 nanometers, a sub-nanometer variation can be over 1% error.

Original Article Clarification Regarding Yields (1)

raftpeople (844215) | more than 8 years ago | (#15721604)

"Clarification Tom Reeves, IBM's VP of semiconductor and technology services, said he was not making any specific references to past or current Cell yields in an executive insight interview that ran last week. He was, instead, referring to large die yield challenges in general and the successful leverage provided by logic redundancy strategies. IBM does not release product specific yield information. This clarification was made on July 14, 2006."

Re:Not NINE processors, only EIGHT, since... (1)

be-fan (61476) | more than 8 years ago | (#15722404)

We're talking about the Cell in general, not what Sony decides to ship in the PS3.

Re:Not NINE processors, only EIGHT, since... (1)

aliquis (678370) | more than 8 years ago | (#15723348)

But how much larger is the CPU than the regular ones?

Bla, bla, bla.... (1)

Yvan256 (722131) | more than 8 years ago | (#15721804)

It gives a decent overview of the structure of the cell processor itself, including the CBE, PPE, and SPE units.


As long as I can't see the PS3 running those incredible games with out-of-this-world AI and physics and all, I won't buy into this whole "Cell FUD"...

"Emotion Engine", anyone?

Looks like a fun project... (1)

The_Incubator (819401) | more than 8 years ago | (#15734687)

I'll bet programming the Cell would be so much fun if you were working in a scientific or graphics research lab at a university. It has "wouldn't it be cool if..." written all over it, but I feel sympathy for the developers who will have to make code run on this thing and make deadlines.
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>