Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

AMD Demonstrates "Teraflop In a Box"

kdawson posted more than 7 years ago | from the speedy-silicon dept.

AMD 182

UncleFluffy writes "AMD gave a sneak preview of their upcoming R600 GPU. The demo system was a single PC with two R600 cards running streaming computing tasks at just over 1 Teraflop. Though a prototype, this beats Intel to ubiquitous Teraflop machines by approximately 5 years." Ars has an article exploring why it's hard to program such GPUs for anything other than graphics applications.

cancel ×

182 comments

Sorry! There are no comments related to the filter you selected.

well, it shouldn't be (3, Funny)

jimstapleton (999106) | more than 7 years ago | (#18194966)

It shouldn't be a TERAble FLOP at the stores anyway. Nice performance...

OK, yes, bad pun, bad spelling, you can "-1 get a real sense of humor" me now.

Texas Hold'em Mega-Mega Tourney (0)

Anonymous Coward | more than 7 years ago | (#18195668)

Imagine turning the flop on a million million hands of Texas Hold'em.

Re:well, it shouldn't be (-1, Offtopic)

justinbach (1002761) | more than 7 years ago | (#18195962)

No way is it gonna be a flop at the stores! Haven't you heard Justin Timberlake's new song?

I know what MY girlfriend's getting for her birthday...
my teraflop in a box!

Re:well, it shouldn't be (-1, Offtopic)

jimstapleton (999106) | more than 7 years ago | (#18196020)

To be honest, I'd rather gauge out my eardrums with a rusty spoon than hear that...

Compatibility (1)

mrchaotica (681592) | more than 7 years ago | (#18195012)

Even if Nvidia's CUDA is as hard as the Ars Technica article suggests, I still hope AMD either makes their chips binary compatible, or makes a compiler that works for CUDA code.

Re:Compatibility (4, Interesting)

level_headed_midwest (888889) | more than 7 years ago | (#18195210)

The chips are a much different ISA, so there's no way that binaries that will run on G80 hardware will run on an R600. Heck, even the ATi R400 series (x700, x8x0) is not binary-compatible with the current R500 x1000 units.Maybe ATi will make a CUDA compiler, but I am guessing that since folks have already gotten going using the R500 hardware (see: http://folding.stanford.edu/ [stanford.edu] I doubt that AMD/ATi will make a big effort to use a competitor's technology. Please correct me if I am incorrect, but I am not aware of any groups or programs that use NVIDIA hardware as number-crunchers yet.

Re:Compatibility (1)

MrHanky (141717) | more than 7 years ago | (#18196326)

That seems likely, but it should be possible to make an API like OpenGL for more general processing as well, shouldn't it? Then all you need is a driver, and your code won't be obsolete every time a new generation GPU comes out.

Re:Compatibility (4, Informative)

UncleFluffy (164860) | more than 7 years ago | (#18196408)

Even if Nvidia's CUDA is as hard as the Ars Technica article suggests, I still hope AMD either makes their chips binary compatible, or makes a compiler that works for CUDA code.

From what I saw at the demo, the AMD stuff was running under Brook [stanford.edu] . As far as I've been able to make out from nVidia's documentation, CUDA is basically a derivative of Brook that has had a few syntax tweaks and some vendor-specific shiny things added to lock you in to nVidia hardware.

ubiquitous (4, Insightful)

Speare (84249) | more than 7 years ago | (#18195034)

Look up 'ubiquitous' before you whine about how far behind Intel might seem to be.

Though having one demonstration will help spur the demand, and the demand will spur production, I still think it'll be five years before everybody's grandmother will have a Tf lying around on their checkbook-balancing credenza, and every PHB will have one under their desk warming their feet during long conference calls.

Not misleading at all (1)

minginqunt (225413) | more than 7 years ago | (#18195038)

Oh no.

I mean, the PS3 does 2 Teraflops! OMG, they're like 20 years ahead of Intel, who are so RUBBISH.

And what would be the theoretical floppage of, say, a Intel Core 2 Extreme with 2 x nVidia GTXs in a dual SLI arrangement using CUDA? I'm willing to bet it would be somewhat higher than this setup.

Re:Not misleading at all (4, Interesting)

sumdumass (711423) | more than 7 years ago | (#18195078)

Isn' the reason this is so interestiong because you cannot have a Intel Core 2 Extreme with 2 x nVidia GTXs in a dual SLI arrangement using CUDA pushing a tflop at this present time?

Maybe soon but I thought it isn't _now_!

Re:Not misleading at all (1)

ArcherB (796902) | more than 7 years ago | (#18195112)

Isn' the reason this is so interestiong because you cannot have a Intel Core 2 Extreme with 2 x nVidia GTXs in a dual SLI arrangement using CUDA pushing a tflop at this present time?

Excellent point! Expect to see a nVidia/Intel partnership in 5, 4, 3, 2...

Re:Not misleading at all (-1, Flamebait)

Anonymous Coward | more than 7 years ago | (#18195144)

Stop with the "5..4..3..2..1" garbage. It's getting tiring.

Re:Not misleading at all (3, Insightful)

BobPaul (710574) | more than 7 years ago | (#18195592)

Excellent point! Expect to see a nVidia/Intel partnership in 5, 4, 3, 2...
Good call! That must be why nVidia has decided to enter the x86 chip market and Intel has significantly improved their GPU offerings, as well as indicate they may include vector units in future chips, because these companies plan to work together in the future! It's so obvious! I wish I hadn't paid attention these past 6 months, as it's clearly confused me!

Re:Not misleading at all (0, Offtopic)

Dread Pirate Skippy (963698) | more than 7 years ago | (#18196078)

Oh snap! =O

Re:Not misleading at all (2, Informative)

ArcherB (796902) | more than 7 years ago | (#18196294)

hat must be why nVidia has decided to enter the x86 chip market and Intel has significantly improved their GPU offerings, as well as indicate they may include vector units in future chips, because these companies plan to work together in the future! It's so obvious! I wish I hadn't paid attention these past 6 months, as it's clearly confused me!

Sarcasm suits you well.

While Intel and nVidia may both be independently reinventing the wheel right now, neither seems to be getting very far very fast. Intel's video offerings have been poor at best and no one has seen an nVidia x86 processor. AMD has already demo'd a prototype, which means they are further along with this Fusion than both Intel and nVidia combined. I don't think it will take long for the decision makers at both of these companies to realize that the other has the missing component.

Of course, you could be right. This is pure speculation on my part and I am pretty much talking from my ass. Still, the idea makes perfect sense to me.

Re:Not misleading at all (0)

Anonymous Coward | more than 7 years ago | (#18195708)

Isn' the reason this is so interestiong because you cannot have a Intel Core 2 Extreme with 2 x nVidia GTXs in a dual SLI arrangement using CUDA pushing a tflop at this present time?
Who says cannot? You don't have such a dastardly box because you chose not to have one.

Maybe soon but I thought it isn't _now_!
If you really wanted it, you could have it right now. Stop whining. It's your own decision if you don't have one now!

Re:Not misleading at all (0)

Anonymous Coward | more than 7 years ago | (#18196524)

lol.. I'm not the one wanting either. The first to do what can already be done means they were the first to do it. I dunno why this is so dificult. I mean if it was so easy and obvious, then everyone would already be doing it right?

Give some credit were credit is due. It may very well be that an Intel Nvidia system could out do this. But no body has tryed it until This thing was shown off.

Re:Not misleading at all (1)

SP33doh (930735) | more than 7 years ago | (#18195654)

a dual 8800gtx configuration can indeed pull this off.

Re:Not misleading at all (2, Insightful)

HappySqurriel (1010623) | more than 7 years ago | (#18195806)

Well, as I see it, advertizing "[some amazing benchmark] in a box" is reasonably foolish because I could produce a system with amazing theoritical performance that doesn't really perform that much better than a system that is a fraction of the cost ... It wasn't that long ago where you could (easily) buy motherboards that supported 2 or 4 seperate processors, and people have generated Quad-SLi setups; what this means is you could create a 4 processor Core 2 Duo system with a Quad SLi Geforce 8800 GTx which (in most applications) would not perform much better than a single processor Core 2 Duo system with a single Geforce 8800GTx.

Re:Not misleading at all (0)

neverpsyked (578012) | more than 7 years ago | (#18196284)

I would have rated you "+1 Insightful" if it weren't for the complete lack of anything resembling proper English.

Step 1 (3, Funny)

Anonymous Coward | more than 7 years ago | (#18195040)

Step 1: Put your chip in the box.

Re:Step 1 (1)

Veetox (931340) | more than 7 years ago | (#18195276)

...Make Ars open the box, and that's the way you do it, BABY!

It's My Flop In a Box! (0)

Anonymous Coward | more than 7 years ago | (#18195416)

Make her open the box...

Step 2 (5, Funny)

Saikik (1018772) | more than 7 years ago | (#18195648)

Step 2: Don't leave your box in Boston.

One... (0)

Anonymous Coward | more than 7 years ago | (#18195766)

You cut a hole in the box.

Re:Step 1 (3, Informative)

Anonymous Coward | more than 7 years ago | (#18196306)

Step 1: Put your chip in the box.
Dude. You have to cut a hole in the box first, otherwise you will pinch your junk...err...your chip under the lid.

1 Teraflop you say? (3, Funny)

TheCreeep (794716) | more than 7 years ago | (#18195050)

How much is that in BogoMIPS?

Re:1 Teraflop you say? (1)

solevita (967690) | more than 7 years ago | (#18195088)

Or football pitch lengths?

Re:1 Teraflop you say? (5, Funny)

minginqunt (225413) | more than 7 years ago | (#18195092)

How much is that in BogoMIPS?

That's TWELFTY BAJILLION BogoMIPS. Per fortnight.

Re:1 Teraflop you say? (1)

garcia (6573) | more than 7 years ago | (#18195308)

And it had to go uphill both ways! Fuck that's fast.

Re:1 Teraflop you say? (1)

hey! (33014) | more than 7 years ago | (#18196206)

That's TWELFTY BAJILLION BogoMIPS. Per fortnight.


So "teraflop" is a unit of computational acceleration? Cool.

Re:1 Teraflop you say? (1)

clickclickdrone (964164) | more than 7 years ago | (#18195424)

>BogoMIPS?
Does dual-core give you BOGOFMIPS?

Never thought of that (3, Interesting)

arlo5724 (172574) | more than 7 years ago | (#18195070)

I might be (read: am mostly) retarded but I never thought of using a graphics processor for anything else, but with the super cards around the corner it makes sense that some normal processing jobs could be farmed out to the GPU when its not being occupied with graphics duties. Does anyone know where I can find some extra info on this, or to what extent this is being implemented? My curiosity is piqued!

Re:Never thought of that (-1, Flamebait)

Anonymous Coward | more than 7 years ago | (#18195284)

You're correct, you are retarded. Using graphics cards for other (similiar) tasks have been around a while now and there have been enough /. stories you missed on it. Get a clue and do your own research before asking others to do work for you.

Re:Never thought of that (3, Informative)

Anonymous Coward | more than 7 years ago | (#18195428)

Check out this web site: http://www.gpgpu.org/ [gpgpu.org]

It is up to date and contains a lot of related information.

WP

Re:Never thought of that (1)

clickclickdrone (964164) | more than 7 years ago | (#18195484)

Isn't there a version of Folding@Home that uses the GFX cores?

Re:Never thought of that (4, Informative)

theantipop (803016) | more than 7 years ago | (#18195650)

http://folding.stanford.edu/FAQ-ATI.html [stanford.edu]

It's still in beta AFAIK, but it has been in development for quite some time.

Wow... (0)

Howard Beale (92386) | more than 7 years ago | (#18195094)

imagine a Beow....ah, screw it.

Re:Wow... (2, Funny)

tttonyyy (726776) | more than 7 years ago | (#18195374)

imagine a Beow....ah, screw it.
Imagine a Beowulf cluster of organically connected people imagining Beowulf clusters - I'd have Quake running on you at a squillian FPS in no time!

Re:Wow... (1)

jimstapleton (999106) | more than 7 years ago | (#18195414)

Yes, but will it run Lin...

Screw it, I prefer BSD anyway.

Two words (0, Redundant)

paintballer1087 (910920) | more than 7 years ago | (#18195098)

Beowolf cluster.... I think that's all that needs said

OOOoooo (5, Interesting)

fyngyrz (762201) | more than 7 years ago | (#18195104)

it's hard to program such GPUs for anything other than graphics applications

It might be hard, but then again, it might be worthwhile. For instance (I'm a ham radio operator) I ran into a sampling shortwave radio receiver the other day. Thing samples from the antenna at 60+ MHz, thereby producing a stream of 14-bit data that can resolve everything happening below 30 MHz, or in other words, the entire shortwave spectrum and longwave and so on basically down to DC.

Now, a radio like this requires that the signal be processed; first you separate it from the rest, then you demodulate it, then you apply things like notch filters (or you can do that prior to demodulation, that's very nice) you build an automatic gain control to handle amplitude swings, provide a way to vary the bandwidth and move the filter skirts (low and high) independently... you might like to produce a "panadapter" display of the spectrum around the signal of interest where the is a graph that lays out signal strengths for a defined distance up and down spectrum... you might want to demodulate more than one signal at once (say, a FAX transmission into a map on the one hand, and a voice transmission of the weather on the other.) And so on - I could really go on for a while.

The thing is, as with all signal processing, the more you try to do with a real-time signal, the more resources you have to dedicate. And this isn't audio, or at least, not at the early stages; a 60+ MHz stream of data requires quite a bit more in terms of how fast you have to do things to it than does an audio stream at, say, 44 KHz.

Bit signal processing typically uses fairly simple math; a lot of it, but you can do a lot without having to resort to real craziness. A teraflop of processing that isn't even happening on the CPU is pretty attractive. You'd have to get the data to it, and I'm thinking that would be pretty resource intensive, but between the main CPU and the GPU you should have enough "ooomph" left over to make a beautiful and functional radio interface.

There is an interesting set of tasks in the signal processing space; forming an image of what is going on under water from sound (not sonar... I'm talking about real imaging) requires lots and lots of signal processing. Be a kick to have it in a relatively standard box, with easily replaceable components. Maybe you could do the same thing above-ground; after all, it's still sound and there are still reflections that can tell you a lot (just observe a bat.)

The cool thing about signal processing is that a lot of it is like graphics, in a way; generally, you set up some horrible sequence of things to do to your data, and then thrash each sample just like you did the last one.

Anyway, it just struck me that no matter how hard it is to program, it could certainly be useful for some of these really resource intensive tasks.

Re:OOOoooo (4, Insightful)

sitturat (550687) | more than 7 years ago | (#18195208)

Or you could just use the correct tool for the job - a DSP. I don't know why people insist on solving all kinds of problems with PC hardware when much more efficient solutions (in terms of performance and developer effort) are available.

Re:OOOoooo (4, Insightful)

fyngyrz (762201) | more than 7 years ago | (#18195252)

I don't know why people insist on solving all kinds of problems with PC hardware when much more efficient solutions (in terms of performance and developer effort) are available.

Simple: they aren't available. PC's don't typically come with DSPs. But they do come with graphics, and if you can use the GPU for things like this, it's a nice dovetail. For someone like that radio manufacturer, no need to force the consumer to buy more hardware. It's already there.

Re:OOOoooo (2, Interesting)

try_anything (880404) | more than 7 years ago | (#18195732)

You can buy a decent FPGA development board and turn it into a DSP for the price of a high-end graphics card. It isn't a trivial project to get started with, but it might be easier than using a GPU. Plus, the skills and hardware from this project will take you much farther than GPU skills.

Get started here [fpga4fun.com] and find some example DSP cores here [opencores.org] .

Re:OOOoooo (1)

fyngyrz (762201) | more than 7 years ago | (#18195804)

If you were going to go to that kind of trouble, why not buy a chip (or entire board) designed to be a DSP? Why go the FPGA route? Not trying to be nasty, I assume you have a reason for suggesting this, I just don't know what it is.

Re:OOOoooo (1)

try_anything (880404) | more than 7 years ago | (#18196286)

The original poster seems to want a lot of control and the possibility of tinkering with different configurations -- "Be a kick to have it in a relatively standard box, with easily replaceable components." Working with FPGAs gives you that software-like ability to create or download new components and rearrange them to fit your needs. A DSP board gives you one fixed layout of components. Plus, you can have fun turning the FPGA into anything else you want.

Re:OOOoooo (1)

Jeff DeMaagd (2015) | more than 7 years ago | (#18195450)

The graphics processor is basically a DSP now.

We use computers to do things that it really isn't the best at doing, but we use the computer because it is so flexible at doing so many things and cheaply, wheras a DSP in a specialized box may be better for a specific single task, the economies of scale come into play.

Re:OOOoooo (4, Insightful)

maird (699535) | more than 7 years ago | (#18195538)

A DSP probably is more efficient for that task but you can't go down to your local WalMart and buy one. Besides, even if you could, the IC isn't much use to anyone. Don't forget that you need at least a 60MHz (yes, sixty megahertz) ADC and DSP pair to do what was suggested. The cost of building useful supporting electronics around a DSP capable of implementing a direct sampling receiver at 60MHz would be prohibitive in the range $ridiculous-$ludicrous. Add to that the cost of getting any code written for it and the idea becomes suitable for military application only. OTOH, the PC has a huge and varied user base so it has the price consistent with being a mere commodity. It is general purpose and can be adapted to a large variety of tasks. It is relatively cheap to write code for and has a huge base of capable special interest programmers. If there is a 60+MHz ADC out there somewhere for a reasonable price then it isn't just a matter of whether a DSP is a better tool, a PC is a trivially cheap tool by comparison. You'd still need a decent UI to use an all-band direct sampling HF receiver. A PC would be good for that too, so keep it all in the same box. You can buy non-direct sampling receivers with DSPs in them at prices ranging from $1000 to exceeding $10000. The DSP is probably no faster than about 100kHz so the signal has to be passed through one or more analogue IF stages to get the signal you want into the 50kHz that can be decoded. You can probably buy a PC from with greater digital signal processing potential for less than $500. A 30MHz direct sampling receiver will receive and service 30MHz worth of bandwidth simultaneously. Not long after general availability, the graphics card configuration in question will probably cost less than $1000. With the processing capabilities it has you (the human) will probably run out of ability to interpret simultaneously decoded signals before the PC runs out of ability to decode more (it's really hard to listen to two conversations at the same time on an HF radio).

Re:OOOoooo (5, Informative)

End Program (963207) | more than 7 years ago | (#18196522)

Don't forget that you need at least a 60MHz (yes, sixty megahertz) ADC and DSP pair to do what was suggested. The cost of building useful supporting electronics around a DSP capable of implementing a direct sampling receiver at 60MHz would be prohibitive in the range $ridiculous-$ludicrous.

Maybe there aren't any DSP available and low cost, if you aren't a hardware designer:

400 MHz DSP $10.00 http://www.analog.com/en/epProd/0,,ADSP-BF532,00.h tml [analog.com]
14-bit, 65 MSPS ADC $30.00 http://www.analog.com/en/prod/0,,AD6644,00.html [analog.com]
Catching non-designers talking smack ...priceless

Re:OOOoooo (1)

compling (514537) | more than 7 years ago | (#18196484)

As others have commented, DSPs are not necessarily the most cost efficient option. At a previous job, I ended up writing two versions of the system we were developing: one for TIs newest, hottest DSP, pre-release version, and another version for the PC. Optimized the hell out of the DSP version, used every trick I knew.

In the end, I had to conclude that a dual-cpu system, still cheaper than a DSP-based solution, would blow away the DSP in terms of performance. It was a bit of a shock to me at the time.

Not sonar? (1)

dunc78 (583090) | more than 7 years ago | (#18195316)

So how is an image being formed under water using sound without using sonar? Also, I bet we could do the same thing above ground and maybe above the water we could try to image using radio waves. Since it is using radio waves, lets call it a radar.

Re:Not sonar? (3, Insightful)

fyngyrz (762201) | more than 7 years ago | (#18195544)

You use ambient sound instead of radiating a signal yourself, and you try to resolve the entire environment, rather than just the sound emitting elements in the environment. This makes you a lot harder to detect; it also makes resolving what is going on a lot more difficult. Hence the need for lots of CPU power. In the water or in the air. Passive sonar - at least typically - is intended to resolve (for instance) a ship or a weapon that is emitting noise. But the sea is emitting noise all the time - waves, fish burping, whale calls, shrimp clicking - all kinds of noise, really. Using that noise as the detecting signal is the trick, and it isn't very similar to normal sonar, in terms of what kind of computations or results are required. Classic sonar gives you a range and bearing; this kind of thing is aimed at giving you an actual picture of the environment. It's a lot harder to do, but man, is it cool.

Re:OOOoooo (1)

maxume (22995) | more than 7 years ago | (#18195546)

You've seen http://gnuradio.org/trac [gnuradio.org] and http://www.ettus.com/ [ettus.com] ?

Re:OOOoooo (1)

fyngyrz (762201) | more than 7 years ago | (#18195740)

Yes, I have. Great pointers; thanks.

Re:OOOoooo (1)

nrrd (4521) | more than 7 years ago | (#18196746)

I think underwater exploration is really interesting, but know almost nothing about it. I'm curious what you mean by "forming an image of what is going on under water from sound (not sonar... I'm talking about real imaging)". Do you mean a full-on photographic quality image? Something like side-scan radar? Would you mind posting more? I'm not sure what you mean and I'd like to learn a little about this.

Intel? (0)

PFI_Optix (936301) | more than 7 years ago | (#18195120)

Shouldn't we be talking about nVidia, since this is a GPU?

Ding Dong (-1, Offtopic)

Anonymous Coward | more than 7 years ago | (#18195540)

My schlong is long and looks like a ding dong. All women welcome.

Re:Intel? (1)

TheDreadSlashdotterD (966361) | more than 7 years ago | (#18195878)

Say what you want about intel graphics chipsets, but they make those too.

Re:Intel? (1)

PFI_Optix (936301) | more than 7 years ago | (#18196172)

Can I buy two Intel GPU-based cards and team them in an attempt to match AMD's (really, ATI's) performance?

Can I buy a motherboard with this Tflop technology integrated?

Apples and oranges. I suspect fanboyism.

DVI??? (-1, Offtopic)

sumdumass (711423) | more than 7 years ago | (#18195218)

Are the DVI conections really all that better? I havn't seem too much of a difference in them. I doubt most of my normal customers would see the difference too.

Or is it just a case were I have always used the cheapest monitors and you need higher end one to see the diffence? IF this is the case, Then would a cheap video card be sufficient to get noticable results? I have found the higher end monitors look just as good or just as poor reguardles of how they are conected. (with Video cards that have both VGA and DVI conectors.) I always asumes it was a card or the monitor itselse causing the issues if any.

lol what? (0)

Anonymous Coward | more than 7 years ago | (#18195504)


   

Re:DVI??? (1)

sumdumass (711423) | more than 7 years ago | (#18196708)

ehh?? I posted this in this article's discusion [slashdot.org] .

I have no idea how it ended up here. I didn't have this story open yet when posting this. Ohh well. Shit happens..lol

The first rule of teraflop club... (4, Insightful)

Duncan3 (10537) | more than 7 years ago | (#18195274)

Don't mention the wattage...

And the second rule of teraflop club...

Don't mention the wattage...

Back here in the real world where we PAY FOR ELECTRICITY, we're waiting for some nice FLOPS/Watt, keep trying guys.

And they announced this some time ago didn't they?

Also (2, Interesting)

Sycraft-fu (314770) | more than 7 years ago | (#18195636)

There's a real difference between getting something to happen on a quasi-DSP like a GPU and on a real, general purpose processor like a CPU. If GPUs were full out CPU replacements, well then we wouldn't have CPUs any more, would we? The problem is that they are very very fast, but only at some things. Now that's fine, because that's what they were designed for. They are made to push pixels really fast and if they can do anything else, well bonus. However it does mean that they aren't a general purpose computing replacement.

Also, the more specialized you get your DSP, the easier it is to get speed out of it. I'm sure it wouldn't be hard to design (I'm sure they already exist) a very narrow purpose DSP that does over 1 trillion floating point ops per second. However that's real different than having a CPU that will do the same, and do it across many kinds of ops.

So as nifty as shit like this might be, it is real disingenuous to pretend that they've "beat" Intel. Intel isn't talking about a graphics card, they are talking about their CPUs. By the numbers my GPU has always been faster than my CPU, as well it should. There'd be no point in paying for specialized hardware if I had general purpose hardware that was faster.

Re:Also (0)

Anonymous Coward | more than 7 years ago | (#18196194)

"I'm sure it wouldn't be hard to design (I'm sure they already exist) a very narrow purpose DSP that does over 1 trillion floating point ops per second."

A trillion? That's nothing! My super-specialised processor can do infinity calculations. Per femtosecond. They all have to be 0 + 0 though.

Re:The first rule of teraflop club... (5, Informative)

dlapine (131282) | more than 7 years ago | (#18195662)

LOL- you're complaining about wattage for 1 TF when they did it on a pair of friggin' video cards?? That's gotta be what, 500 watts total for whole PC?


We've run several PC clusters and IBM mainframes that didn't have a 1TF of capacity. You don't want know much power went into them. Yes, our modern blade-based clusters are more condensed, but they're still power hogs for dual and quad core systems.

Blue gene is considered to be a power efficient cluster and the fastest [top500.org] , but it still draws 7kw per rack of 1024 cpus [ibm.com] . At 4.71 TF per rack, even Blue Gene pulls 1.5kw per teraflop.

Yes, it's a pair of video cards, and not a general purpose cpu, but your average user doesn't have ability to program and use a Blue Gene style solution either. They just might get some real use out of this with a game Physics Engine that taps into this computing power.

This is cool.

Re:The first rule of teraflop club... (2, Informative)

Duncan3 (10537) | more than 7 years ago | (#18196544)

Count real, usable FLOPS. GPU's don't win.

But for ~$500, it's what's going to be used.

It isn't that they are hard to use for more... (3, Informative)

Assmasher (456699) | more than 7 years ago | (#18195304)

...generic purposes, it is that they're (GPUs) suited better for certain types of operations. Image processing, as an example, is very well suited to working on a GPU because the GPU excels at addressing and operating on elements of arrays (textures basically.) I've used it as a proof of concept at work for processing large numbers of video feeds simultaneously for things like photometric normalization, image stabilization, et cetera, and the things are awesome. They work well in this scenario because the problem I'm trying to solve fits the caveats of using the GPU well. Slow upload of data, miraculously fast action upon that data, slow download of the data. Now, slow is relative and getting more and more relative as new chipsets are released.

The actual framework for doing this is relatively simple although it certainly did help that I've a background in OpenGL and DirectXGraphics (so I've done shader work before); however, again, progress is removing those caveats as well. Generic GPU programming toolsets are imminent the only problem being ATI has no interest in their toolsets working with nVidia and nVidia has even less interest in their toolset(s) running ATI hardware. Something we'll just have to learn to deal with.

BTW, DirectX10 will make this a little easier as well with changes to how you have to pipeline data in order to operate on it in a particular fashion.

Notpick (4, Informative)

91degrees (207121) | more than 7 years ago | (#18195314)

That should be Teraflops. Flops is Floating-point operations per second, so always has an s on the end even if singular.

Re:Notpick (0)

Anonymous Coward | more than 7 years ago | (#18196216)

er, nitpick?

Re:Notpick (4, Funny)

91degrees (207121) | more than 7 years ago | (#18196478)

Yup. It's the law. Any post pointing out an error must have at least one error itself.

Re:Notpick (0)

Anonymous Coward | more than 7 years ago | (#18196750)

Just like kilobyte, megabyte, terabyte is always plural, right? Or like kilogram, for that matter, never comes without an s? Because FLOP itself is singular, of course.

I think you're misguided. 1 teraflop is singular.

Re:Notpick (0)

Anonymous Coward | more than 7 years ago | (#18196834)

I've worked in Supercomputing for almost 20 years and have yet to hear or see someone put an "s" at the end of anything when 1 is the unit. It's alway 1 Gigaflop, 1 Petabyte, 1 Megabit, etc, etc.

Look at http://www.top500.org/ [top500.org] and you'll see they don't put an "s" when referring to a single Teraflop.

Worthless Preview (2, Insightful)

jandrese (485) | more than 7 years ago | (#18195364)

So the preview could be boiled down to: Card still in development will be faster than cards currently available for sale.

It also included some pictures of the cooling solution that will completely dominate the card. Not that a picture of a microchip with "R600" written on it would be a lot better I guess. Although the pictures are fuzzy and hard to see, it looks like it might require two separate molex connections just like the 8800s.

Aren't G5 PowerPC Macs rated at 1 TF already? (1)

david.emery (127135) | more than 7 years ago | (#18195370)

I thought the dual CPU G5 machines were rated at 1 teraflop. Certainly PowerPC AltiVec processors are super floating-point engines (but I don't know exactly how they rank at flops/mhz....)

But then maybe the issue depends on the notion of what is "ubiquitous" and Macs don't qualify. I dunno, but I'm sure someone on /. will correct me :-)

        dave

HTX (1)

Joe The Dragon (967727) | more than 7 years ago | (#18195410)

How long before they put in the on the HT bus using a HTX slot?

I could use it to program my automatic toaster (2, Funny)

BrentRJones (68067) | more than 7 years ago | (#18195462)

which is fully connected to the Internet so that I can put my toast down or pop it up remotely.

Wait...from some of the other comments about electricity usage, I might be able to do away with the heating coils and use the circuits themself to toast. That would really be an environment plus. Wonder how it would affect the taste of the bread?

Re:I could use it to program my automatic toaster (1)

david.emery (127135) | more than 7 years ago | (#18196788)

So would the heat sinks leave 'scorch marks'? Would this lead to a redesign of heatsinks to provide branding/corporate logos on toast?

It might be kinda cool to get "Intel Inside" burnt onto a panini sandwich... :-)

        dave

General Purpose Programmers (3, Informative)

Doc Ruby (173196) | more than 7 years ago | (#18195478)

it's hard to program such GPUs for anything other than graphics applications.


"Anything other" is "general purpose", which they cover at GPGPU.org [gpgpu.org] . But the general community of global developers hasn't gotten hooked on the cheap performance yet. Maybe if someone got an MP3 encoder working on one of these hot new chips, the more general purpose programmers would be delivering supercomputing to the desktop on these chips.

Maybe if the docs where released (0)

Anonymous Coward | more than 7 years ago | (#18196052)

The problem is they have multiple new plataforms and they do not release docs for the bare metal. Compare that behaviour to common architectures like x86, 68K, PPC, ARM, MIPS or the big bunch of DSPs and microcontrollers. You can get books or PDFs with all the instruction, memory ranges, timings... you get all you need to really program them, build compilers, or even design new systems around a chip, and by experience you know they do not change such things at will, some systems are decades old. Thus, investing time is better done in systems that have proven stable and are clearly well documented. They have to choose if they want to be a market standard or keep their "precious" IP. Intel must have lose all of it by now, publishing docs like how to use the SSE instructions, yeah.

Re:General Purpose Programmers (1, Interesting)

Anonymous Coward | more than 7 years ago | (#18196084)

Maybe if someone got an MP3 encoder working on one of these hot new chips, the more general purpose programmers would be delivering supercomputing to the desktop on these chips.

I'm still waiting for realtime raytracing GPUs [openrt.de] .

Swiss army spatula (0)

Joebert (946227) | more than 7 years ago | (#18195534)

Ars has an article exploring why it's hard to program such GPUs for anything other than graphics applications.

In other news, Martha Stuart explains who screwdrivers don't make good hammers.

Re:Swiss army spatula (1)

Joebert (946227) | more than 7 years ago | (#18195568)

explains who screwdrivers

I'm soo ready for the weekend I'm starting at the end of words, then jumping back to the beginning to type the rest of them.

This Just In (1)

Waffle Iron (339739) | more than 7 years ago | (#18195590)

Specialized hardware units rack up impressive benchmark numbers on specific tasks relative to general-purpose CPUs. News at 11.

No, Ars didn't say why. Here's why. (4, Informative)

Animats (122034) | more than 7 years ago | (#18195638)

Ars has an article exploring why it's hard to program such GPUs for anything other than graphics applications.

No, Ars has an article blithering that it's hard to program such GPUs for anything other than graphics applications. It doesn't say anything constructive about why.

Here's an reasonably readable tutorial on doing number-crunching in a GPU [uni-dortmund.de] . The basic concepts are that "Arrays = textures", "Kernels = shaders", and "Computing = drawing". Yes, you do number-crunching by building "textures" and running shaders on them. If your problem can be expressed as parallel multiply-accumulate operations, which covers much classic supercomputer work, there's a good chance it can be done fast on a GPU. There's a broad class of problems that work well on a GPU, but they're generally limited to problems where the outputs from a step have little or no dependency on each other, allowing full parallelism of the computations of a single step. If your problem doesn't map well to that model, don't expect much.

Added caveat: (1)

Ayanami Rei (621112) | more than 7 years ago | (#18195974)

You don't need greater than 32-bit precision for any of the MAC ops. Usually that kind of limitation can be overcome by rethinking the algorithm, and doing some accumulation or error analysis outside of the GPU.

Chip in a Box (5, Funny)

natrius (642724) | more than 7 years ago | (#18195702)

To all the fellas out there with geek friends to impress
It's easy to do, just follow these steps:
One: Cut a hole in a box
Two: Stick your chip in that box
Three: Make her open the box
And that's the way you do it
It's my chip in a box

What about using it for Graphics? (1)

LWATCDR (28044) | more than 7 years ago | (#18195742)

Could this be the start of some really good opensource drivers for ATI cards?
Just how much of X and OpenGL could they offload on this card?
What Theora, Ogg, Speex, or Divx encoding and decoding?
I know it is a radical idea but since they are optimized for graphics and graphics like operations why not use them for that?

Three step process... (1)

Sampy (213238) | more than 7 years ago | (#18195744)

1. Cut a hole in a box
2. Put your chips in that box
3. Make AMD open the box

That's the way you do it
It's a teraflop in a box!

There is no such thing as a "Teraflop" (1)

Lobais (743851) | more than 7 years ago | (#18195794)

It is 1 Teraflops since it stands for Tera floating point operations per second. The ending "s" is not plural.

SuperCell (2, Informative)

Doc Ruby (173196) | more than 7 years ago | (#18195854)

The Playstation 3 [wikipedia.org] is reported to harness 2 TFLOPS [wikipedia.org] . But "only" 204GFLOPS run on the Cell CPU, 10%. The other 1.8TFLOPS runs on the nVidia G70 GPU. But the G70 runs shaders, very limited application to anything but actually rendering graphics.

The Cell itself is notoriously hard to code for. If just some extra effort can target the nVidia, that's TWO TeraFLOPS in a $500 box. A huge leap past both AMD and Intel.

Thats cheap (1)

majortom1981 (949402) | more than 7 years ago | (#18196062)

Thats not right AMD is cheating. AMD is also using the Video cards. AMD did not beat Intel. If they are going to go against intel with processing stuff they should do it via CPU's only. IF Intel did the same thing with 2 nvidia cards in sli I bet they could get the same results.

Re:Thats cheap (0)

Anonymous Coward | more than 7 years ago | (#18196662)

Well, as AMD owns ATI, they only used their own products to get that result, whereas Intel would have to use someone else's products.

This reminded me of something... (0, Redundant)

Torsoboy (1057192) | more than 7 years ago | (#18196122)

1. Cut a hole in the box. 2. Put your flop in that box. 3. Make her open the box. And that's the way you do it!

Well...duh (5, Insightful)

Anonymous Coward | more than 7 years ago | (#18196154)

GPGPU is hard because we're still in the very early days of this particular revolution. As I think about it, and from what we know of AMD's plans in particular, I think this is kind of like the evolution of FPU.

See, in the early days FPU was a seperate chip (anyone remember buying an 80387 to plug into their mobo?). Writing code to use FPU was also a complete pain in the ass, because you had to use assembly, with all the memory management and interrupt handling headaches inherent. FPUs from different vendors weren't guaranteed to have completely compatible instruction sets. Because it was such a pain in the ass, only highly special purpose applications made use of FPU code. (And, it's not that computer scientists hadn't thought up appropriate abstractions to make writing floating point easy. Compilers just weren't spitting out FPU code).

Then, things began to improve. The FPU was brought on die, but as an optional component (think 486SX vs 486DX). Languages evolved to support FPUs, hiding all the difficulty under suitible abstractions so programmer could write code that just worked. More applications began to make use of floating point capabilities, but very few required a FPU to work.

Finally, FPU was brought on die as a bog standard part of the CPU. At that point, FPU capabilities could be taken for granted and an explosion of applications requiring an FPU to achieve decent performance ensued (see, for istance, most games). And writing FPU code is now no longer any more difficult than declaring type float. The compiler handles all the tricky parts.

I think GPGPU will follow a similar trajectory. Right now, we're in phase one. Use a GPU for general purpose computation is such an incredible pain that only the most specialized applications are going to use GPGPU capabilities. High level languages haven't really evolved to take advantage of these capabilities yet. And yes, it's not as though computer scientists don't have appropriate abstractions that would make coding for GPGPU vastly easier. Eventually, GPGPU will become an optional part of the CPU. Eventually high level languages (in addition to the C family, perhaps FORTRAN or Matlab or other languages used in scientific computing) will be extended to use GPGPU capabilities. Standards will emerge, or where hardware manufacturers fail to standardize, high level abstraction will sweep the details under the rug. When this happens, many more applications will begin to take advantage of GPGPU capabilities. Even further down the road, GPGPU capabilities will become bog standard, at which point will see an explosion of applications that need these capabilities for decent performance.

Granted, the curve for GPGPU is steeper because this isn't just a matter of different instructions, but a change in memory management as well. But I think this kind of transition can and will eventually happen.

Future plans (4, Funny)

UnknowingFool (672806) | more than 7 years ago | (#18196734)

Though a prototype, this beats Intel to ubiquitous Teraflop machines by approximately 5 years."

So I take it that AMD will be ready for Vista's successor?

I put my junk in a box (0)

Anonymous Coward | more than 7 years ago | (#18196798)

Cue the "Dick In A Box" jokes...
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>