Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

AMD Unveils SSE5 Instruction Set

CowboyNeal posted about 7 years ago | from the new-and-improved dept.

AMD 85

mestlick writes "Today AMD unveiled its 128-Bit SSE5 Instruction Set. The big news is that it includes 3 operand instructions such as floating point and integer fused multiply add and permute. AMD posted a press release and a PDF describing the new instructions."

cancel ×

85 comments

Sorry! There are no comments related to the filter you selected.

Who cares... (1, Insightful)

aquaepulse (990849) | about 7 years ago | (#20421049)

in 2009 I'll be holding out for SSE8 anyway.

Re:Who cares... (1)

SlappyBastard (961143) | about 7 years ago | (#20421143)

Just from a brief overview of AMD's releases, there seems to be some voodoo built in for combining iterative operations into a single execution. Of course, most things from AMD have limited meaning until they have chips in developers' hands. But, this has the potential to offer more efficient processing.

The right way is go to ppc64. (0)

Anonymous Coward | about 7 years ago | (#20422551)

The REX prefix for R8..R15 instructions is bloated code underperforming all.

You only can do is:
* buy a tri-core xbox360 3.2 GHz ppc64 (512 MiB of RAM)
* buy a G5 ppc64 (ppc970) (2 GiB of RAM)
* buy a XCluster blade (ppc970)
* buy a mono-core ps3 with 7 idle nurses (256 MiB of RAM).
* o don't but it until 2 years later (use still Full-System-Simulator from IBM)

I want a pure 64-bit x 8 Altivec/VMX and not 32-bit.

Re:The right way is go to ppc64. (0)

Anonymous Coward | about 7 years ago | (#20422845)

Have they extended Altivec to double-precision floating point yet?

Re:Who cares... (0)

Anonymous Coward | about 7 years ago | (#20421265)

Yeah.. just another non-story.

huh (-1, Offtopic)

Anonymous Coward | about 7 years ago | (#20421093)

prost fist

Re:huh (-1, Offtopic)

Anonymous Coward | about 7 years ago | (#20421245)

You fail at trolling.

Well, I'm excited. I think. (4, Insightful)

Harik (4023) | about 7 years ago | (#20421153)

So, where's the analysis by people who write optimized media encoders/decoders? How useful are these new instructions, or are they just toys? How well did they handle context switching? What's the CX overhead? Is there a penalty for all processes, or only when you are switching to/from a SSE5 process? Will this be safely usable under all operating systems, or will they need a patch?

Re:...or are they just toys? (5, Funny)

theGreater (596196) | about 7 years ago | (#20421261)

It ROUNDSS! It ROUNDSS us! It FRCZSS! Nasty AMD added to it.

Re:...or are they just toys? (1)

Pojut (1027544) | about 7 years ago | (#20423381)

You owe me a cup of coffee and a new keyboard.

Re:...or are they just toys? (2, Funny)

ben there... (946946) | about 7 years ago | (#20428869)

Nasty AMD added to it.
The better question is how the fuck did AMD get to write the next iteration of an Intel technology. Shouldn't it be AMD 3DNow!^2? This is like Apple deciding their next HFS filesystem will be versioned NTFS 7.0.

They can battle back and forth with version numbers and see who is first to get to 11, the version number where, for whatever reason, developers are forced to come up with a new versioning scheme. That will throw a wrench in the works. Take that Intel!

Re:Well, I'm excited. I think. (3, Interesting)

PhrostyMcByte (589271) | about 7 years ago | (#20421335)

I don't write those fancy codecs, but I can immediately see where some of these instructions could come in handy - for instance, PCMOV and PTEST (packed cmov/test).

The new instructions take up an extra opcode byte, but seeing how they will lower the amount of instructions you would otherwise do, I don't see that as a problem. The super instructions (like FMADDPS - Multiply and Add Packed Single-Precision Floating-Point) do more than just help the instruction decoder too - they mention "infinitely precise" intermediate voodoo for several of them which makes it seem like doing a FMADDPS instead of a MULPS,ADDPS will result in a more accurate result.

There are new 16-bit floating point instructions too, which I can see as a boon for graphics wanting the ease of floating point and a little higher rounding precision than bytes with values between 0 and 255 would give, without the large memory requirements of 32-bit floating point.

Re:Well, I'm excited. I think. (1)

Arimus (198136) | about 7 years ago | (#20421539)

Being thick (and out of coffee) how the hell can any thing be infinitely precise? Or atleast while it can be infinitely precise how do you go about checking it... might take a while to prove it for all possible numbers (of which there is an infinite amount of, and for each one you would have to check it to an infinite number of decimal places).

One of my pet peeves is statements like infinite precise :)

Re:Well, I'm excited. I think. (1)

obsolete1349 (969869) | about 7 years ago | (#20421583)

A very quick Google search for "infinite precise" yielded this.

What I think you meant was, "How can the infinitely precise number be stored and accessed by a computer?" Well, that's not the same thing.

Re:Well, I'm excited. I think. (1)

obsolete1349 (969869) | about 7 years ago | (#20421591)

Guess the filter didn't like my url.... http://www.bookrags.com/research/infinite-precisio n-arithmetic-wcs/ [bookrags.com] in plain text.

Re:Well, I'm excited. I think. (2, Insightful)

aquaepulse (990849) | about 7 years ago | (#20421707)

And assuming that a floating-point number is represented by 128 bits, that still means there are only 2^128 (that is, 17,179,869,184) discrete values it can represent.
Sadly that's wrong. 17,179,869,184 = 2^34. I mean, is it that difficult for people writing articles to check their math.

Re:Well, I'm excited. I think. (1)

GrievousMistake (880829) | about 7 years ago | (#20422927)

Oh, be fair. It's only 33 orders of magnitude. (base 10, anyway)

Re:Well, I'm excited. I think. (1)

seaturnip (1068078) | about 7 years ago | (#20425575)

He took the number of values representable by a 32-bit number and multiplied it by four, since 128 = 32 * 4. Makes perfect sense!

Re:Well, I'm excited. I think. (3, Informative)

CryoPenguin (242131) | about 7 years ago | (#20422979)

Being thick (and out of coffee) how the hell can any thing be infinitely precise?

The result will still eventually be stored back into a floating-point number. What it means for an intermediate computation to be infinitely precise is just that it doesn't discard any information that wouldn't inherently be discarded by rounding the end result.
When you multiply two finite numbers, the result has only as many bits as the combined inputs. So it's quite possible for a computer to keep all of those bits, then perform the addition with that full precision, and then chop it back to 32bits. As opposed to implementing the same operation with current instructions, which would be: multiply, (round), add, (round).

Re:Well, I'm excited. I think. (2, Informative)

gnasher719 (869701) | about 7 years ago | (#20429319)

>> Being thick (and out of coffee) how the hell can any thing be infinitely precise? Or atleast while it can be infinitely precise how do you go about checking it... might take a while to prove it for all possible numbers (of which there is an infinite amount of, and for each one you would have to check it to an infinite number of decimal places).

I'll give you an example. Lets say we are working with four decimal digits instead of 53 binary digits, which is what standard double precision uses. Any operation will behave as if it calculated the infinitely precise result and then rounded it. For example, any result x that is in the range 1233.5 = x = 1234.5 with infinite precision will be rounded 1234.

Now lets say we calculate x * y + z with infinite precision and round. We have x = 2469, y = 0.5, and z happens to be 0.00000000001. So x * y = 1234.5, x *y + z is just a tiny bit larger, so the result has to be rounded up to 1235. To do this right, you need x * y with infinite precision. Knowing twelve decimals wouldn't be enough. If I told you "x * y equals 1234.50000000 with twelve digit precision", you wouldn't know how to round x * y + z. x * y could be 1234.499999996, and adding z would still be less than 1234.5, so it needs to be rounded down. Or x * y could be 1234.500000004, and x * y + z needs to be rounded up.

That is meant by "infinite precision": The processor guarantees to give the same result _as if_ it would use infinite precision for the calculation. In practice, it doesn't use infinite precision. About 110 binary digits precision is enough to get the same result.

Re:Well, I'm excited. I think. (2, Informative)

arodland (127775) | about 7 years ago | (#20429499)

The important word there is intermediate. You don't get a result of infinite precision, you get a 32-bit result (since the parent mentioned single-precision floating point). But it carries the right number of bits internally, and uses the right algorithms, so that the result is as if the processor did the multiply and add at infinite precision, and then rounded the result to the nearest 32-bit float. Which is better than the result you would get by multiplying two 32-bit floats into a 32-bit float, then adding that to another 32-bit float into a 32-bit float. You're limited to 32 bits at all times and therefore you have intermediate precision loss.

Making sense now?

Re:Well, I'm excited. I think. (1)

Arimus (198136) | about 7 years ago | (#20432305)

It did actually make sense before my post.... it is just not infinitely precise in a pure sense of the word infinite. I was being somewhat hmmm... me - my (oh god, what's the word I'm looking for - kind of style) does not come across well sometimes in textual form.

It's a couple links deep... (5, Informative)

SanityInAnarchy (655584) | about 7 years ago | (#20421449)

Read this interview with Dr Dobbs [ddj.com] :

A floating-point matrix multiply using the new SSE5 extensions is 30 percent faster than a similar algorithm

I believe this helps gaming and other simulations.

Discrete Cosine Transformations (DCT), which are a basic building block for encoders, get a 20 percent performance improvement

And then we have the "holy shit" moment:

For example, the Advanced Encryption Standard (AES) algorithm gets a factor of 5 performance improvement by using the new SSE5 extension

If I get one of these CPUs, I'll almost certainly be encrypting my hard drives. It was already fast enough, but now...

As for existing OS support, it looks promising:

We're also working closely with the tool community to enable developer adoption -- PGI is on board, updates to the GCC compiler will be available this week, and AMD Code Analyst Performance Analyzer, AMD Performance Library, AMD Core Math Library and AMD SimNow (system emulator) are all updated with SSE5 support.

So, if you're really curious, you can download SimNow and emulate an SSE5 CPU, try to boot your favorite OS... even though they say they're not planning to ship the silicon for another two years. Given that they say the GCC patches will be out in a week, I imagine two years is plenty of time to get everything rock solid on the software end.

Re:It's a couple links deep... (0)

Anonymous Coward | about 7 years ago | (#20422547)

If I get one of these CPUs, I'll almost certainly be encrypting my hard drives. It was already fast enough, but now...
Good luck recovering any information in case of trouble.

Re:It's a couple links deep... (4, Funny)

funfail (970288) | about 7 years ago | (#20422753)

Why? Recovery is 5 times faster now.

Re:It's a couple links deep... (1)

runderwo (609077) | about 7 years ago | (#20424831)

If the encryption is performed on a block-by-block basis, it would make no difference in terms of recovery. The only time it could possibly make a difference is when a block is only partially read, but no hard drive these days will return a data block that contains an ECC error. You either get a full clean block, which you can then decrypt, or an ATA error, at which point encryption of the data is irrelevant because you get none of it anyway.

Backups. (1)

SanityInAnarchy (655584) | about 7 years ago | (#20427243)

Good luck recovering any information when your hard drive dies entirely.

Re:It's a couple links deep... (1)

cortana (588495) | more than 6 years ago | (#20439705)

Please elaborate?

Re:It's a couple links deep... (3, Informative)

gnasher719 (869701) | about 7 years ago | (#20423817)

>> And then we have the "holy shit" moment:

For example, the Advanced Encryption Standard (AES) algorithm gets a factor of 5 performance improvement by using the new SSE5 extension
If I get one of these CPUs, I'll almost certainly be encrypting my hard drives. It was already fast enough, but now...

They copied two important features from the PowerPC instruction set: Fused multiply-add (calculate +/- x*y +/- z in one instruction), and the Altivec vector permute instruction, which can among other things rearrange 16 bytes in an arbitrary way. The latter should be really nice for AES, because it does a lot of rearranging 4x4 byte matrices (if I remember correctly).

Re:It's a couple links deep... (1)

RAMMS+EIN (578166) | about 7 years ago | (#20425855)

For example, the Advanced Encryption Standard (AES) algorithm gets a factor of 5 performance improvement by using the new SSE5 extension


Any idea how this stacks up against VIAs Padlock?

AES - how is speedup achieved? (2, Interesting)

Paul Crowley (837) | about 7 years ago | (#20432781)

I've just paged through the spec PDF, and I can't work out for the life of me how these instructions help you implement AES. In normal implementations AES does sixteen byte-to-word table lookups per round and these lookups take nearly all the time; they also open up a host of vulnerabilities in side channel attacks. To avoid these lookups you have to have a way of doing the GF(2^8) arithmetic directly, and I can't see any way these instructions will help.

Anyone got any guesses? Someone who understands Matsui's recent work on bitslice AES implementations better than I do? Will this implementation be resistant to lookup-based side channel attacks?

Re:Well, I'm excited. I think. (1)

GroovBird (209391) | about 7 years ago | (#20421735)

Context switching doesn't apply. There's no such thing as an SSE5 process. All non-privileged instructions on the CPU are available to the processes that run on it. The OS swaps out the full state of the CPU when switching context, so it swaps those SSE registers out as well. Therefore, the OS must know what registers to swap out, but since these instructions appear to work on the same ol' SSE/SSE2 registers, a relatively recent OS should have no problem supporting applications that use them.

Re:Well, I'm excited. I think. (1)

Harik (4023) | about 7 years ago | (#20430641)

That's what I was asking, thanks. I missed that it hadn't added any new SSE registers. Don't be so quick on the "No such thing as a SSE5 process" though - there IS such a thing as a FPU process, because of an ancient design decision from intel that had the FPU as a coprocessor. That's stuck with us right to the point of 64bit processors - and they still have to emulate it in 32bit mode.

Re:Well, I'm excited. I think. (1)

GroovBird (209391) | about 7 years ago | (#20430733)

Could you clarify this? The only thing that I'm aware of is that as part of the MMX instruction set, if you use MM registers you need to clear them (EMMS instruction) before you can use the FPU.

no additional CX overhead (1)

r00t (33219) | about 7 years ago | (#20430349)

This isn't adding new registers. It doesn't have the MMX defect. It's just more SSE stuff.

APL (3, Funny)

Citizen of Earth (569446) | about 7 years ago | (#20421155)

instructions such as floating point and integer fused multiply add and permute

So machine languages are APL-compatible these days.

Re:APL (2, Interesting)

Ilyon (1150115) | about 7 years ago | (#20421305)

I would say APL has always been compatible with the various vector/parallel machine languages. With the general but precise nature of APL expression, it should be easy to generically and efficiently parallelize/vectorize any APL interpreter for any machine architecture. Is there much activity in marketing of current APL products? It seems like IBM is doing nothing more than supporting existing customers. Jim Brown and company established SmartArrays, which caters a specific C APL library to specific customers. MicroAPL seems to be diversifying into other areas, although they still update APLX periodically. I haven't seen much action on the open source front, although I have seen an open source APL project on Sourceforge. Is there any chance that the emergence of parallel architectures will spur a resurgence of interest in APL?

Matlab, Numpy, FORTRAN, ... (1)

m2943 (1140797) | about 7 years ago | (#20422313)

Basically, Matlab, Numpy, FORTRAN, and similar languages have the array processing features of APL with a more traditional syntax. So, interest in APL has never really disappeared.

Cryptographer's Take? (1)

JimXugle (921609) | about 7 years ago | (#20421361)

Can one of the cryptographers on slashdot comment on weather this is useful to them or not?

(yes, I am paranoid... why do you ask? are you with the CIA?)

Re:Cryptographer's Take? (1, Funny)

rts008 (812749) | about 7 years ago | (#20421481)

"...weather this is useful to them..."

The weather (www.weather.com) is dependant on where you live and what specific time frame you are inquiring about, subject to the meteorologists report for that time frame and area.

??!!Whether???!! Hmmm... that's a whole different subject, but as I am with the CIA, why do you ask? Are you paranoid or something?

***Hmmm...jimXugle (921609)....posted....logging on server....LOGGED!

What was your question? We are from the government, we can help, honest!

Re:Cryptographer's Take? (0)

Anonymous Coward | about 7 years ago | (#20421565)

Don't worry about this guy, Jim...."Seen the oliphaunt" is code for "taken lots and lots of mind-altering, psychosis-inducing drugs". It definitely seems to give him a different perspective.

Re:Cryptographer's Take? (1)

Iam9376 (1096787) | about 7 years ago | (#20421761)

Actually its the FBI you would need to be concerned about, as they gather information about US citizens, whereas the CIA gathers foreign intelligence.

Re:Cryptographer's Take? (2, Insightful)

MrNaz (730548) | about 7 years ago | (#20421861)

Great! I'm glad those two organizations have such a long and distinguished history of self-restraint when it comes to the borders of their mandated spheres of operation.

Re:Cryptographer's Take? (2, Interesting)

gnasher719 (869701) | about 7 years ago | (#20423869)

'' Can one of the cryptographers on slashdot comment on weather this is useful to them or not? ''

One useful addition (copied from Altivec) is the vector permute instruction. What is clever about it in terms of cryptography is that you can translate a vector using a 256 byte translation table _without doing any memory access_ by using the vector permute instruction in a clever way. Now the execution time is completely data-independent, so one important attack vector is closed.

Can someone explain please (1)

yamamushi (903955) | about 7 years ago | (#20421429)

Can someone explain how a 64bit processor can run 128 bit instructions, or what this actually means? Thanks

Re:Can someone explain please (2, Informative)

NeuralAbyss (12335) | about 7 years ago | (#20421563)

The 64-bit designation refers to the width of the address bus*. For example, IA-32 processors have been able to handle 64 bit integers for ages.. so a 64-bit address-capable processor handling 128 bit numbers is nothing new.

* Yes, PAE was a slight deviation from a 32 bit address space, but in userspace, it's 32 bit flat memory.

Re:Can someone explain please (3, Informative)

GroovBird (209391) | about 7 years ago | (#20421691)

I believe the 64-bit designation refers to the width of the general purpose registers. This usually correlates to the address space used, but has nothing to do with the address bus. The 8086, for example, while being a 16-bit processor had a 20-bit address bus. The 8088 was a 16-bit processor, but only had an 8-bit data bus to save costs. Both were 16-bit processors, because the general purpose registers (AX, BX, CX, DX) were 16-bit.

In the x64 world, the general purpose registers are 64-bit wide. This also used to influence the width of the 'int' datatype in the C compiler, although I'm not sure that 'int' is a 64-bit integer when compiling x64 code.

Re:Can someone explain please (1)

Wyzard (110714) | about 7 years ago | (#20422697)

That means my twelve-year-old HP48 calculator has a 64-bit processor, despite having a 4-bit bus and 20-bit addresses. :-)

32-bit Genesis before 16-bit Super NES? (1)

tepples (727027) | about 7 years ago | (#20422769)

I believe the 64-bit designation refers to the width of the general purpose registers. This usually correlates to the address space used, but has nothing to do with the address bus. The 8086, for example, while being a 16-bit processor had a 20-bit address bus. The 8088 was a 16-bit processor, but only had an 8-bit data bus to save costs.
Are you implying that the Sega Genesis was 32-bit long before the 3DO and PlayStation?

Re:32-bit Genesis before 16-bit Super NES? (1)

GroovBird (209391) | about 7 years ago | (#20423989)

Well, that certainly is a good question. I'm not trying to imply anything, but the Wikipedia article on the matter clearly states that the Motorola 68000 is a 16-bit architecture even though its general purpose registers, and basic arithmetic functions are 32-bit, simply because it has a 16-bit data bus.

It also states here [wikipedia.org] that a 16-bit architecture is one with a 16-bit data bus, address bus or register size. Perhaps the Motorola 68000 was never advertized as a 32-bit machine, because that sort of marketing ploy was not exercised at the time?

dave

Bit count is still confusing (1)

tepples (727027) | about 7 years ago | (#20424283)

the Wikipedia article on the matter clearly states that the Motorola 68000 is a 16-bit architecture even though its general purpose registers, and basic arithmetic functions are 32-bit, simply because it has a 16-bit data bus.

It also states here [wikipedia.org] that a 16-bit architecture is one with a 16-bit data bus, address bus or register size.
Wouldn't that make the Super NES an 8-bit system? Its 65C816 CPU had 16-bit registers and an 8-bit data bus. And was the Nintendo 64 an 8-bit system because it used 8-bit RDRAM at a comparatively high clock rate for the time [wikipedia.org] ?

Perhaps the Motorola 68000 was never advertized as a 32-bit machine, because that sort of marketing ploy was not exercised at the time?
Believe me, bit counts were the marketing ploy of the time.

Re:32-bit Genesis before 16-bit Super NES? (1)

be-fan (61476) | more than 6 years ago | (#20444783)

The 68k is for all intents and purposes a 32-bit machine. It had a 32-bit native word and 32-bit addresses.

Current "64-bit" CPUs have 128 bit memory busses -- that doesn't make them 128-bit.

Re:32-bit Genesis before 16-bit Super NES? (2, Informative)

Jagetwo (1133103) | about 7 years ago | (#20427475)

Motorola 6800x, 68010 are 16-bit designs, that is, 16-bit processors with 32-bit register file. Whenever you used 32-bit operands on those CPUs, they were slower, because it was really executing them in 16-bit parts. Bus was also 16-bits wide, but with 24 address lines. It was just a forward-thinking design hiding 16-bitness.

Re:32-bit Genesis before 16-bit Super NES? (1)

ravyne (858869) | about 7 years ago | (#20427571)

As we've read here, bit designations have no real broadly-accepted definition and its more a matter of what marketing slaps on the chip.

The 68000 is a chip capable of performing 32bit arithmetic, but only able to load 16 bits at a time, therefore, it was most efficient to rely on 16bit values when possible (even though the extra 16 bits allowed you to do some neat tricks.) Later revisions of the 68000 exposed the entire 32bit data bus without changing the general architecture of the core. Those are clearly 32bit systems through and through. The fact that the 68000 had a 16bit memory bus is really more of a function of a) 16bit memory being popular at the time, and b) had they gone to a wider bus they would have needed to move to a package with more pins (as later versions did) rather than the 64pin DIP which is far easier to work with than SMT or PGA packages.

Intel sells the 386ex for embedded designs, which is a 32bit core with a 16bit address bus also. It helps keep the pin-count low, thereby reducing design and manufacturing complexity of the supporting system board.

I think that the fairest way is to consider throughput -- If you have a 32bit core, but a 16bit memory bus then I'd classify it as a 16bit system. On the other hand, if that 16bit memory bus runs twice as fast as the CPU core (thereby being able to supply 32bits per CPU cycle) then its fair to call it a 32bit system. If you re-double that again, now the CPU core is the bottleneck, and you've still got a 32bit system.

Modern caches muddle this up a bit, because the bus width between the registers and cache can be different than the width between the cache and main memory (then again, they might also run at different speeds). Here, its fair to use the width between registers and cache since its closer to the CPU - even though the bandwidth between cache and main memory may be a limiting factor.

Re:32-bit Genesis before 16-bit Super NES? (1)

Datamonstar (845886) | about 7 years ago | (#20430865)

No, you're getting confused by blast processing.

Re:Can someone explain please (1)

Paralizer (792155) | about 7 years ago | (#20426357)

This also used to influence the width of the 'int' datatype in the C compiler, although I'm not sure that 'int' is a 64-bit integer when compiling x64 code.
With GCC an int is 4 bytes (32-bit) and a long is 8 bytes (64-bit).

Re:Can someone explain please (5, Informative)

forkazoo (138186) | about 7 years ago | (#20422277)

The 64-bit designation refers to the width of the address bus*. For example, IA-32 processors have been able to handle 64 bit integers for ages.. so a 64-bit address-capable processor handling 128 bit numbers is nothing new.


Technically, the "bit designation" of a platform is defined as the largest number on the spec sheet which marketing is convinced customers will accept as truthful. Seriously, over the years different processors and systems have been "16 bit" or "32 bit" for any number of odd and wacky reasons. for example, the Atari Jaguar was widely touted as a 64 bit platform, and the control processor was a Motorola 68000. The Sega Genesis also had a 68k in it, and was a 16 bit platform. The thing is, Atari's marketing folks decided that since the graphics processor worked in 64 bit chunks, they could sell the system as a 64 bt platform. C'est la vie. It's an issue that doesn't just crop up in video game consoles -- I just find the Jaguar a particularly amusing example.

But, yeah, having a CPU sold as one "bitness" and being able to work with a larger data size than the bitness is not unusual. The physical address bus width is indeed one common designator of bitness, just as you say. Another is the internal single address width, or the total segmented address width. Also, the size of a GPR is popular. On many platforms, some or all of those are the same number, which simplifies things.

An Athlon64, for example, has 64 bit GPR's, and in theory a 64 bit address space, but it actually only cares about 48 bits of address space, and only 40 of those bits can actual be addressed by current implimentations.

A 32 it Intel Xeon has 32 bit GPR's, but an 80 bit floating point unit, the ability to do 128 bit SSE computations, 32 bit individual addresses, and IIRC a 36 bit segmented physical address space. but, Intel's marketing knew that customers wouldn't believe it if they called it anything but 32 bit since it could only address 32 bits in a single chunk. (And, they didn't want it to compete with IA64!)

Tom, Jerry, and IOP (2, Informative)

tepples (727027) | about 7 years ago | (#20422797)

for example, the Atari Jaguar was widely touted as a 64 bit platform, and the control processor was a Motorola 68000.
The Jaguar had a 64-bit data bus, a 32-bit CPU "Tom" connected to the GPU, a 32-bit CPU "Jerry" connected to the sound chip, and a 32-bit MC68000 with a 16-bit connection to the data bus, used as an I/O processor (in much the same way that the PS2 uses the PS1 CPU). Some games ran their game logic on "Tom"; others (presumably those developed by programmers hired away from Genesis or Neo-Geo shops) ran it on the IOP. Pretty much only graphics operations ever used the full width of the data bus.

Re:Can someone explain please (0)

Anonymous Coward | about 7 years ago | (#20431499)

Some computer architect did an informal study on the issue about ten years back and concluded that technical people almost invariably decide the "bitness" of a processor on the basis of the width of the general-purpose integer registers in a CPU. It's been posted a number of times on comp.arch, but I don't even remember who it was who wrote it, sorry. Marketing bits are of course another kettle of fish, but who cares about them?

Re:Can someone explain please (1)

gnasher719 (869701) | about 7 years ago | (#20423917)

'' The 64-bit designation refers to the width of the address bus*. ''

Please show us any example of a processor with a 64 bit address bus. I don't think there are any in existence.

What you mean is the width of logical addresses, which is something completely different.

Re:Can someone explain please (0)

Anonymous Coward | about 7 years ago | (#20421669)

Can someone explain how a 64bit processor can run 128 bit instructions, or what this actually means?
When you learn to do math in elementary school, you tend to learn "long-hand" techniques (acting on a single digit at a time, carrying any overflow to the subsequent digit). So, you can think of yourself as a "1-digit" processor (with a digit being the 10-value equivalent of the 2-value bit). Just like how you can add/multiply a pair of two digit numbers using the appropriate carries and shifts, a 64-bit processor can add/multiply a pair of 128-bit numbers.

You could also, of course, do that in software using primitive 64-bit operations, followed by manual shifts, carries, and memory manipulation. Having support to do it directly in hardware, including quickly reading and writing values from memory, is way faster, which is why SSE improvements are interesting. (Incidentally, SSE has always been 128-bit, from what I remember, even on 32-bit processors, where it would essentially do "4-digit" calculations, using 32-bit "digits".)

Re:Can someone explain please (1)

Emetophobe (878584) | about 7 years ago | (#20422727)

There are different types of registers on any modern cpu. For example, general purpose registers, floating point registers and SIMD (Single Instruction, Multiple Data) registers to name a few. The first two types on a 64 bit CPU are 64 bit registers while SIMD registers are 128 bit.

Here [umd.edu] is a brief description of what SIMD is and what it can be used for:

Single Instruction, Multiple Data (SIMD) processors are also known as short vector processors. They enable a single instruction to process multiple pieces of data simultaneously. They work by allowing multiple pieces of data to be packed into one data word and enabling the instruction to act on each piece of data. This is useful in processing cases where the same operation has to performed on large amounts of data. For example, take image processing. A common operation found in programs such as Photoshop would be to reduce the amount of red in an image by half. Assuming a 32-bit traditional processor that is Single Instruction, Single Data (SISD) and a 24 bit image, the information for one pixel would be put into one 32-bit word for processing. Each pixel would have to be processed individually. In a 128-bit SIMD processor, four 32-bit pixels could be packed into one 128-bit word and all four pixels could be processed simultaneously. Theoretically, this translates to a four fold improvement in processing time.

Foundations for the GPU+CPU assimulation... (4, Insightful)

WoTG (610710) | about 7 years ago | (#20421533)

I'm not really qualified to make an opinion on this, but my guess is that these instructions will prove increasingly useful as AMD integrates the GPU and CPU. To me, it looks like they plan to make accessing what was traditionally part of the GPU a simple process (relative to accessing a GPU directly through their own pseudo CPU api's).

It'll take a couple years for "SSE5" to show up in AMD chips... which happens to coincide nicely with their Fusion (combined CPU+GPU) product line plans.

Will Intel pick up on these instructions? Maybe not. Does that mean they die? No, the performance benefits for those areas where this will make the most difference will make it worthwhile. At the very least, AMD can sponsor patches to the most popular bits of OSS to earn a few PR points (and benchmark points).

Re:Foundations for the GPU+CPU assimulation... (1)

Propagandhi (570791) | about 7 years ago | (#20421629)

Not sure I'm following how these denser/more efficient instructions would result in better access to a GPU. Certainly the applications of such instructions (specifically matrices) are something that GPU's handle well, but how would this improve CPU/GPU collaboration? If anything it just gives the CPU jobs which would have had to be hacked into a GPU...

Re:Foundations for the GPU+CPU assimulation... (2, Interesting)

WoTG (610710) | about 7 years ago | (#20421759)

My thought was that the long term plan is to integrate the GPU anyway (for one product line at least). While the GPU is RIGHT THERE, they will find a way to use of much of it as they can when it's not busy with 3D work... which for the average office environment is 95% of the time.

Gamers can still buy addon graphics cards, of course.

Sounds good... I hope (1)

SanityInAnarchy (655584) | about 7 years ago | (#20421641)

If they take anything close to the same attitude with their GPUs as they just did with their new CPU instruction set, that would mean we'd finally have a reasonably fast GPU with a completely open software stack.

As it is, ATI/AMD is maybe less proprietary than nVidia, but their Linux support sucks. Intel, however, typically has very good support, even though it's entirely open drivers, and apparently not sponsored much by Intel itself.

Re:Foundations for the GPU+CPU assimulation... (1)

gnasher719 (869701) | about 7 years ago | (#20424009)

'' I'm not really qualified to make an opinion on this, but my guess is that these instructions will prove increasingly useful as AMD integrates the GPU and CPU. To me, it looks like they plan to make accessing what was traditionally part of the GPU a simple process (relative to accessing a GPU directly through their own pseudo CPU api's). ''

I can't see that at all. Mostly they have been copying stuff that was present on PowerPC CPUs for ages, filled some obvious gaps in the SSE instruction set, and added single instructions for floor (), ceil (), trunc () and round () functions. The only thing that has anything to do with GPUs is an instruction to convert four floating point numbers between 32 bit and 16 bit precision, which is a major pain otherwise if you want to do it in the CPU. And the only reason why this would be useful is that it enables you to send fp values to the GPU using half the bandwidth.

OK, another PDF on a small subset ........ (1)

frovingslosh (582462) | about 7 years ago | (#20421659)

But what I've been looking for, and am amazed that I can't seem to find, is a complete collection of all of the instructions for a current (or any recent) AMD processor. Yea, there are lots of documents that break out a small specialized subset of the instruction set, like this PDF. But without a full instructionset reference it doesn't do me much good. One would think that important information like this would be easy to lay one's hands on, particularly in the information age and when the information in question is the instruction set for a large CPU manufacturer who matains their own large website, but if such a document exists I sure can't find it. I even obtained some reference CDs from AMD hoping it might be there, but no luck.

So since the people posting and reading here are likely to have some knowledge about the instruction set, if anyone can provide me with a link to the full instruction set (less these new instructions, I expect), I would be very greatful.

Re:OK, another PDF on a small subset ........ (1)

forkazoo (138186) | about 7 years ago | (#20422195)

But what I've been looking for, and am amazed that I can't seem to find, is a complete collection of all of the instructions for a current (or any recent) AMD processor. Yea, there are lots of documents that break out a small specialized subset of the instruction set, like this PDF. But without a full instructionset reference it doesn't do me much good. One would think that important information like this would be easy to lay one's hands on, particularly in the information age and when the information in question is the instruction set for a large CPU manufacturer who matains their own large website, but if such a document exists I sure can't find it. I even obtained some reference CDs from AMD hoping it might be there, but no luck.


I'm not sure it exists as a single comprehensive document. My AMD64 Architecture Reference Manual set is five books, if I recall correctly. They were shipped to me for free several years ago, so AMD doesn't keep the information a secret.

I think a single PDF document covering every single instruction in a current CPU in sufficient detail to be useful would almost certainly be so large as to be quite unwieldy. After all, a lot of people use the Adobe PDF reader software! And, even assuming you can view the document, you still have a pain in the ass trying to search it to find whatever you need.

What about 256 bit? (2, Insightful)

renoX (11677) | about 7 years ago | (#20422349)

For 'serious' scientific computing, they use 64b FP number, having vectors of 4 element seems the right size, so SIMD computations of 4*64=256 seems the 'right size' for these users.

Sure multimedia & games use lower precision FP computations so 16b or 32b FP number is enough, but it's strange that AMD doesn't try to improve the usage for the scientific computation niche.

Maybe it's because the change would be expensive as to be efficient, the width of the memory bus should be expanded to 256b from 128b now.

Re:What about 256 bit? (1)

dreamchaser (49529) | about 7 years ago | (#20422629)

You're talking about a very tiny niche of users. The money is in selling consumer and gaming products. Adding all those transistors and bus lines to satisfy a small minority of users doesn't make much business sense. AMD wants to grab the overall performance crown back, not be further pidgeonholed into niche markets.

Re:What about 256 bit? (1)

LWATCDR (28044) | about 7 years ago | (#20424269)

That level of performance will probably be restricted to the new GPU based accelerator cards coming from nVidia and ATI/AMD. You may see it come to mainstream cpus when Intel and AMD merge the CPU and GPU. Since those will probably be used in Notebooks first you should see them in blades pretty quickly as well. What else would you use a GPU core on a blade for but math?

Are Intel and AMD's "x86" strongly diverging? (0)

Anonymous Coward | about 7 years ago | (#20422483)

We're used to seeing Intel and AMD introduce new features quite regularly, but I don't really have a feel for where this is going. Are we witnessing the evolution of two entirely separate architectures here?

If this trend continues then the common set of original x86 instructions could end up as a historical relic, because if your code uses only those old instructions then it might run REALLY slowly on both manufacturers' CPUs, since the advanced manufacturer-specific instructions will be sitting around idle.

Or, is each manufacturer implementing the others' special instructions too?

A question for those who are keeping track of instruction sets. :-)

Re:Are Intel and AMD's "x86" strongly diverging? (1)

GrievousMistake (880829) | about 7 years ago | (#20423073)

I doubt this will be any more disruptive than SSE1, SSE2, SSE3 or SSE4 was.
They stayed synched when going 64-bits, after all. They can compete on speed and features, but both would lose if they destroyed the x86 platform by becoming incompatible.

AMD just forked x86 (2, Interesting)

RecessionCone (1062552) | about 7 years ago | (#20425535)

If you read the fine print, AMD is actually not implementing all of SSE4 on the Bulldozer chip which will be the first to include SSE5. This is disastrous - the SSE "brand" has always implied backwards compatibility: SSE1 contains MMX, SSE2 contains SSE1 & MMX, etc. etc. Now AMD is breaking this, since SSE5 chips will not include all of SSE4. AMD shouldn't have named these new extensions SSE5. As it is, they are forking the x86 instruction set, which is a bad thing for all of us.

Here's some more information: http://www.anandtech.com/cpuchipsets/showdoc.aspx? i=3073 [anandtech.com]

Wow, did he get ripped off! (0)

Anonymous Coward | about 7 years ago | (#20423063)

He paid $165 each for AMD X2 3800+ cpus?? Remind me never to buy from that NFP enterprises place the hawks in his writeup. Sounds like a ripoff joint.

I thought SSE was Intel's... (1)

The Wicked Priest (632846) | about 7 years ago | (#20424953)

...and 3DNow! was AMD's. Doesn't seem right for AMD to be introducing an SSE variant.

Re:I thought SSE was Intel's... (1)

BrianGKUAC (919321) | about 7 years ago | (#20426415)

You're thinking of MMX

Re:I thought SSE was Intel's... (1)

ravyne (858869) | about 7 years ago | (#20427675)

Intel and AMD have a cross-licensing agreement that was reached as part of a settlement (anti-trust against Intel I believe) to promote cross-compatibility. Basically, the instructions are up for grabs even though each company's implementations are kept secret. One will introduce an enhancement, then the other will integrate it into their core when they can.

MMX/3DNow! are the early SIMD instructions which used FPU resources to reduce cost and maintain drop-in compatibility with Operating systems (The OS stores FP registers on a task switch, while it would not have known about any new register sets.)

SSE has its own set of registers, instruction encoding, etc so it doesn't interfere with the floating point unit (You can't mix FP and MMX code without a considerable performance hit) but to make use of SSE the OS had to be updated to support it.

Does this impact molecular dynamics simulations? (1)

bradbury (33372) | about 7 years ago | (#20425911)

For those who actually understand real molecular nanotechnology, aka "Drexlerian" nanotechnology, you may understand that one of the real "breakthroughs" comes when you can computationally simulate the function of a 4 to 8 million atom molecular nanoassembler. Because if you can simulate one and prove that it does not violate any laws of physics then one of the classical oppositions to real molecular nanotechnology falls [1]. The argument transitions entirely from "it can't work" (common among people oriented towards "dissing" nanotech) to "you can't build one" . And as DRM, the iPhone restrictions, etc. have all shown "can't" is very swampy territory to wade into.

Now, I know if I've got 8 million cores, such a simulation is probably feasible (and presumably bandwidth limited by hypertransport data transfer rates) so the question transitions to how many atoms can one core handle and that in turn transitions to how effective the instruction set is at performing the math required for molecular dynamics simulations. So, is SSE5 any better than this or should I be lobbying AMD for SSE6 which is explicitly targeted at molecular dynamics simulations? It is not the market for business computing but it is the market that potentially millions of "nanoengineers" will fall into.

It also goes without saying that the chip manufacturers and ubergamers and SecondLife participants all have a high interest in achieving this because pushing below ~32nm using current technology is going to get very dicey at which point Moore's Law is going to have to shift from bulk atom assembly (current lithography methods) to precision atom assembly (real molecular nanoassembly).

1. There is a third argument against the simulation of a molecular nanoassembler. The argument that an atom specific design for a 4-8 million atom nanoassembler does not currently exist. The best one can point to is a few thousand atom Fine Motion Controller (http://www.imm.org/research/parts/controller/ [imm.org] ) designed by Drexler and Merkle. However the Nanoengineer software (http://www.nanoengineer-1.com/content/ [nanoengineer-1.com] ) from Nanorex allows one to design elements of an actual nanoassembler. If even a mere one thousand /. readers were to add 1 atom a day to the design in a distributed open source NanoAtHome.org (http://www.nanoathome.org/ [nanoathome.org] ) type project -- the design would be complete within 1-2 years (there is a significant amount of redundancy and therefore human intellect amplification in the atom placement in a nanoassembler). You can't simulate it without designing it first -- but if one can design 400 million transistor microprocessors then designing an 8 million atom nanoassembler shouldn't be that difficult.

Re:Does this impact molecular dynamics simulations (1)

Synic (14430) | about 7 years ago | (#20428155)

Wouldn't a theoretical quantum computer be more helpful, since you can evaluate many bit combinations simultaneously?

Re:Does this impact molecular dynamics simulations (1)

bradbury (33372) | about 7 years ago | (#20430205)

Perhaps. While molecular dynamics simulations are inherently "quantum", I have yet to see a paper which proposes how to solve the equations using a quantum compute and Perhaps a chicken and egg situation. Perhaps after multi-Qubit computers are common one will see attempts at having them perform molecular dynamics simulations. Until then, the equations for molecular simulations are reasonably well defined (electrostatic interactions between nuclei surrounded by electron clouds in motion). A non-trivial computational problem but one which we can understand from a theoretical perspective and model reasonably accurately. It is somewhat similar to simulations involving the formation of solar systems but at a much different scale.

Re:Does this impact molecular dynamics simulations (1)

bdjacobson (1094909) | about 7 years ago | (#20430637)

I think you may already have what you need for the simulation of such a device. Folding at home has been pumping out protein sequences for years-- but especially now that we have GPGPU I would imagine the simulation wouldn't be too difficult.

As for designing the system that you want to simulate; the thing with microprocessors is that they're very modular. You can create a register, use it 256 or however many times, and there's your cache. Then you build the part that interfaces the rest of the CPU with that group of registers, and deals with addressing, etc; and there you've got something that you can reuse again and again by simply making minor modifications to the gate schematic if, say, you wanted a 64 bit register instead of the 32 bit register you'd already designed. So the processors we're working with now are largely the result of sitting on shoulders of giants. The majority of the work has already been done, and then the engineers add a few things here and there like MMX/3dnow, SSE, etc; and then make various architecture changes, some minor, some major (Hyper Transport/On Die memory controller comes to mind), etc.

Now is your molecular nanoassembler modular like this? For sure it doesn't have the years of design and reusable hardware behind it that microprocessors have; so the comment about 400m transistor processors isn't exact applicable as far as I can tell-- one is such a mature technology that design can be, if one were on a very very tight budget and simply had the resources, literally copy and past; the other would require much initial R&D if I understand the idea correctly (you'd have to first design the mobility you need of the arms of the assembler, then design the best mechanical way to implement those arms, and make sure at such small scales they can withstand the torque you need, and then finally turn that into a molecular design)? And then write/tweak the software to interact with your design and start the simulation, etc.

Come to think of it, it seems to me processing power would be the least of the worries; but I really don't know.
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>