×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

AMD Previews New Processor Extensions

kdawson posted more than 6 years ago | from the parallel-universes dept.

AMD 198

An anonymous reader writes "It has been all over the news today: AMD announced the first of its Extensions for Software Parallelism, a series of x86 extensions to make parallel programming easier. The first are the so-called 'lightweight profiling extensions.' They would give software access to information about cache misses and retired instructions so data structures can be optimized for better performance. The specification is here (PDF). These extensions have a much wider applicability than just parallel programming — they could be used to accelerate Java, .Net, and dynamic optimizers." AMD gave no timeframe for when these proposed extensions would show up in silicon.

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

198 comments

Is that what's holding up Barcelona? (0)

Anonymous Coward | more than 6 years ago | (#20241217)

Anybody?

Silicon Problems. (0, Interesting)

Anonymous Coward | more than 6 years ago | (#20241367)

They can't get the chips to clock up nicely as a whole; an individual chip or a few dozen individuals can, but most of them are binning in the sub-2GHz category, and that's simply atrocious; no matter how much "better" they are than Intel's quad cores, Intel's are already pushing 3GHz (and benchmarking roughly 50% better, meaning both architectures are performing pretty similarly and roughly the same clock-for-clock).

The first stab at Barcelona we're getting are going to pathetically under-perform compared to the competition.

2008: x86_64 retired. (0)

Anonymous Coward | more than 6 years ago | (#20241677)

2008: x86_64 retired because of bad performance, there are many prefix's bytes of the instructions of the CISC ISA x86_64.

x86-64 IS DEAD!!!

Let's go ppc64!!!

Let's go IBM!!! Let's go AMD-IBM!!!

Re:2008: x86_64 retired. (0)

Anonymous Coward | more than 6 years ago | (#20242623)

IBM: please, retire you archaic x86-64.
AMD: sure?
IBM: yes, you can market cheap ppc64 four-cores 1.8 GHz, and i can market mainframes, but the condition is the retired x86-64 from the root of evil of intel 'i386'.
AMD: good business!!! I will accept!!!

Re:Silicon Problems. (1)

Cyclon (900781) | more than 6 years ago | (#20241929)

They can't get the chips to clock up nicely as a whole; an individual chip or a few dozen individuals can, but most of them are binning in the sub-2GHz category

Do you have a source for that, or is it just internet speculation?

Re:Silicon Problems. (2, Funny)

Anonymous Coward | more than 6 years ago | (#20242363)

94.3% of all statistics are made up on the spot.

Re:Silicon Problems. (1)

jombeewoof (1107009) | more than 6 years ago | (#20243129)

You can make up statistics to prove anything. 16% of all people know that.

Re:Silicon Problems. (1)

BillyBlaze (746775) | more than 6 years ago | (#20243387)

Only 3% of Slashdot users haven't heard that joke, and only 2% of those who have still think it's funny for the (on average) 36.4th time.

And so it goes on... (0, Offtopic)

ceeam (39911) | more than 6 years ago | (#20241235)

I wonder - amongst 16-bit "real mode", 16-bit "protected mode", 32-bit mode, 64-bit mode - how many different instruction kinds / opcodes a modern x86 CPU supports?

Re:And so it goes on... (2, Informative)

edwdig (47888) | more than 6 years ago | (#20241353)

There's very little difference between the instructions in the different modes. The memory management unit is where most of the differences are. Properly written 16 bit real mode code will still run in 16 bit protected mode. The only difference is how the segment portion of the pointer in interpreted.

As for 16 bit vs 32 bit modes. The instructions are mostly the same. A code segment is specified as being either 16 or 32 bit. That size is the default data sized used by instructions within that segment. There is a "size override" prefix, which if found immediately before an instruction, tells the CPU that the following instruction should use the opposite of default size.

I don't remember the specifics, but 64 bit mode just continues along with the same ideas. There aren't many changes from 32 bit code to 64 bit.

Re:And so it goes on... (1)

mastermemorex (1119537) | more than 6 years ago | (#20242237)

they could be used to accelerate Java, .Net, and dynamic optimizers

So for CPU manufactures C++ is dead. Thanks for to be so clear.

Re:And so it goes on... (2, Interesting)

Ant P. (974313) | more than 6 years ago | (#20241385)

It was at least 200 last time I read - and the source was an 80486 programming book. I think there's at least that many more in the different versions of SSE.

About 57,839 opcodes (1)

fedorowp (894507) | more than 6 years ago | (#20243175)

The number depends on how you look at it. I made a table that lists every x86 instruction excluding prefixes a while ago and it came out to 57,839 instruction/parameter combinations. That doesn't factor in the specific values passed to the opcode, or in the registers, or the differences in behavior of the chip depending on mode, how memory protection is setup, out of order execution, or instruction prefixes.

The large number of combinations certainly makes validation a tremendous challenge.

Re:And so it goes on... (1)

funkatron (912521) | more than 6 years ago | (#20241863)

Probably enough to start dropping a few. The 16 bit instructions could be disposed of without anyone noticing for a start.

Re:And so it goes on... (1)

TheOrquithVagrant (582340) | more than 6 years ago | (#20242441)

If only it were so. Unfortunately, it's not. There's a distressing amount of 16-bit real-mode code being executed in between power-on and your OS kernel switching into 32 or 64 bit mode even on the most modern PC.

BIOS vs. EFI? (1)

tepples (727027) | more than 6 years ago | (#20242943)

If only it were so. Unfortunately, it's not. There's a distressing amount of 16-bit real-mode code being executed in between power-on and your OS kernel switching into 32 or 64 bit mode even on the most modern PC.
Is this true only of machines that use BIOS, or is it also true of machines that use only EFI?

Re:And so it goes on... (1)

BillyBlaze (746775) | more than 6 years ago | (#20243421)

In a sense, the 16 bit instructions have been dropped, if only when running in 64-bit mode. Which is actually kind of annoying, because it means some of those old Windows 3 and DOS programs won't run without emulation.

You can get the x86/EMT64 documentation from intel (2, Informative)

Gazzonyx (982402) | more than 6 years ago | (#20241865)

If you root around Intel's site a bit, you can get the developer manuals for asm on their chips; I think there's like 5 of them @ 300 pages+ each. It's all the documentation, I think only 1 book is the actual language specs. Anyways, if you ask them nicely via email, they'll send the manuals to you for free. I got mine in under a week from when I emailed them. They even pay shipping.


Also, I know from asm on SPARC that many op codes are really just variations of other ops (and/or pseudo ops). For instance, (I'm not sure of the x86 equivalent) .mul is a pseudo op for sll (shift left logical), IIRC. And almost every op has a data type specific variation (byte, half, word, double, etc), on top of that.

Re:You can get the x86/EMT64 documentation from in (0)

Anonymous Coward | more than 6 years ago | (#20243025)

Intel isn't alone by providing detailed documentation. AMD gives instruction set and detailed optimization tips too.

It would be cool if GPU manufacturers were as helpful as CPU manufacturers are!

Just performance counters? (2, Informative)

Erich (151) | more than 6 years ago | (#20241243)

Looks like there isn't a whole lot there that you couldn't get using existing performance counters and a tool like oprofile....

Re:Just performance counters? (1)

pipatron (966506) | more than 6 years ago | (#20241469)

But this could probably do it dynamic, in realtime, which might be nice. Dunno, didn't RTFA of course.

Re:Just performance counters? (4, Informative)

imgod2u (812837) | more than 6 years ago | (#20242447)

Looking at the PDF, it supposedly gathers profile data in the background (in local caches on the chip itself) and dumps periodically depending on the OS/application settings. This allows it to profile on-the-fly with very little impact on application performance.

The application can then gather the information, which is stored in its address space, and do with it what it will (optimize on-the-fly).

Of particular interest is that the OS can allow the profile information to be dumped to the address space of other threads/processes as well as the one that the data is collected on. The OS controls the switching of the cached profile information during a context switch.

This is both cool (in that a secondary core/thread can help optimize the first) and scary (one thread getting access to another's instruction address information). I predict there will be exactly 42 Windows patches released 3.734 days after the service pack that allows Windows to take advantage of this feature because of security reasons.

I wish AMD and Intel teamed up for once (2, Funny)

rolfwind (528248) | more than 6 years ago | (#20241261)

and did away with the aging x86 instruction set and came up with something new.

Yeah, I know, Intel tried with Itanium.

Re:I wish AMD and Intel teamed up for once (2, Insightful)

Chris Burke (6130) | more than 6 years ago | (#20241321)

Yeah, I know, Intel tried with Itanium.

And you want them to try *again*? As far as I'm concerned the most amazing achievement of IA64 was that they got to start over from scratch, and ended up with an ISA with a manual even bigger than the IA32 manual! Going to prove that the only thing worse than an ISA developed through 20 years of engineering hackery is one developed by committee.

Re:I wish AMD and Intel teamed up for once (1)

gilesjuk (604902) | more than 6 years ago | (#20241389)

Indeed, devices at the lowest level don't always look that pretty. As Linus said, with Itanium Intel threw away all the good bits.

Re:I wish AMD and Intel teamed up for once (1)

dfghjk (711126) | more than 6 years ago | (#20242533)

"As Linus said, with Itanium Intel threw away all the good bits."

It's a good thing Linus leveraged his considerable processor architecture experience while at Transmeta. Where would they be now had he not provided useful advice like that?

Re:I wish AMD and Intel teamed up for once (1)

Chris Burke (6130) | more than 6 years ago | (#20242571)

They'd have been even worse off even sooner than what actually happened. Any other questions?

Re:I wish AMD and Intel teamed up for once (0)

Anonymous Coward | more than 6 years ago | (#20242123)

Come on dude, ISA [wikimedia.org] is 20 year old technology.

Re:I wish AMD and Intel teamed up for once (3, Insightful)

realmolo (574068) | more than 6 years ago | (#20241381)

Yup. They tried it with Itanium, and it didn't work.

The thing is, at this stage in processor design, the actual instruction set isn't all that important.

But *compilers* are more important than ever, and writing a good compiler is hard work. x86 compilers have been tweaked and improved for nearly 30 years. A new instruction set could NEVER achieve that kind of optimization.

Interestingly,the Itanium and the EPIC architecture were designed to move all the hard work of "parallel processing" to the compiler. Unfortunately, they could never get the compiler to work all that well on most kinds of code. The compiler could never really "extract" the parallelism that Itanium CPUs needed to work at full speed.

Which is *exactly* the problem we have now with our multi-core CPUs. Compilers don't know how to extract parallelism very well. It's an *incredibly* difficult problem that Intel has already thrown untold billions of dollars at. Essentially, even though Itanium/EPIC never caught on, we're having to deal with all the same problems it had, anyway.

Re:I wish AMD and Intel teamed up for once (2, Interesting)

Anonymous Coward | more than 6 years ago | (#20241793)

IBM's PPC compiler kicked the shit out of every x86 compiler. (Apples and oranges, but the quality was much better). Same for ARM's compiler and Sun's (SPARC) compiler. Fact is, x86 is the ugly girl at the party, but it gets more attention from GCC, MS, Intel, etc. Native compilers on other architectures beat the shit out of it.

Re:I wish AMD and Intel teamed up for once (3, Insightful)

jguthrie (57467) | more than 6 years ago | (#20242349)

Okay, I'll feed the troll. Tell me where I can buy an ATX (or smaller) PPC motherboard and CPU new for, oh, say $200, and I'll look at PPC again. The reason that x86 gets all the software is because it's the cheapest, it's the cheapest because all the motherboard manufacturers make motherboards for it, and all the motherboard manufacturers make motherboards for it because it gets all the software.

Map and reduce? (4, Interesting)

tepples (727027) | more than 6 years ago | (#20243117)

Compilers don't know how to extract parallelism very well. It's an *incredibly* difficult problem
It's not that compilers can't extract parallelism. It's that the C and C++ language standards lack a way to express parallelism. Often, you want to compute a function for each element in an array, resulting in a new array. In some languages, this is called map(). In Python, this is [expression_involving(el) for el in some_list]. An ideal language would provide a way to express that a function has no side effects, allowing map() to farm out different slices of the array to different CPUs. However, iterators in C++ and many other popular languages assume that the computation may have side effects, and provide no way inside the standard language to ask the compiler to break the computation into slices.

Re:I wish AMD and Intel teamed up for once (1, Interesting)

Anonymous Coward | more than 6 years ago | (#20241411)

I read somewhere that modern x86 processors don't really process x86 opcodes anymore--there's a "translator" that takes the CISC x86 code and converts it into some kind of RISC code. If true, maybe they should enable a way for the processer to use that RISC code without the conversion.

Re:I wish AMD and Intel teamed up for once (1)

dunkelfalke (91624) | more than 6 years ago | (#20241667)

yep, that is right.
especially interesting is the transmeta crusoe cpu which can load different instruction sets and translate them into its native code.

but the thing is, as far as i remember, back at those days when transmeta crusoe was just near the release, linus said something like "i compiled the linux kernel to the native crusoe vliw instructions and it was actually slower than the x86 code"

Re:I wish AMD and Intel teamed up for once (2, Insightful)

Chris Burke (6130) | more than 6 years ago | (#20241753)

If true, maybe they should enable a way for the processer to use that RISC code without the conversion.

I don't think that's a good idea. The internal micro-ops are machine-dependent, and they will change as the microarchitecture changes. By designing the micro-ops specific to the architecture, they can try to make the x86 instruction translate into an optimal sequence of micro-ops. As hardware functionality changes, existing x86 instructions can have the underlying ops changed to suit without you having to re-code or even re-compile your program.

For example: Barcelona is introducing 128-bit wide floating point units for SSE instructions. The previous ones were 64-bits wide, so it would take two operations (and most likely two separate micro-ops) to perform a 128 bit SSE add instruction. Whereas now it will only take one op, and the same x86 instruction can take advantage of that fact without having to know what architecture it is running on. Another example is divides, which on a machine with a hardware divide unit would be only a few instructions, but on a different machine would require a lengthy microcode routine. Your code doesn't have to know; it just runs faster on the code with the hardware DIV.

Not that you can't optimize your x86 code for particular architectures, or that there aren't x86 codes that run on one machine and not another -- though you can check whether the machine can run the code, and you aren't having the entire instruction set change out from underneath you. I'm just saying that only exposing one side of the CISC->RISC conversion gives the chip designers a lot of leeway and you probably don't want to give that up.

Re:I wish AMD and Intel teamed up for once (2, Interesting)

Slashcrap (869349) | more than 6 years ago | (#20241415)

and did away with the aging x86 instruction set and came up with something new.

Yeah, I know, Intel tried with Itanium.


They already did. I believe the 486 was the last CPU to run x86 instructions natively. Everything since the Pentium has decoded them to a RISC like ISA which can be changed every generation if desired. The only drawback is that a relatively small area of the chip needs to be dedicated to decoding x86 instructions to whatever the internal ISA is.

And guess what? One of the things that people dislike about x86 is the variable length instructions. Turns out that it actually leads to more compact code. And the speed gains from reduced cache usage more than make up for the effort and chip real estate expended on those decoders.

So let's stick with x86 for now, since the gains you foresee are either non-existent or tiny and are never, ever going to outweigh the drawbacks.

Re:I wish AMD and Intel teamed up for once (2, Informative)

Chris Burke (6130) | more than 6 years ago | (#20241567)

I believe the 486 was the last CPU to run x86 instructions natively.

Close, it was the original Pentium. The Pentium Pro -- which despite the name which just made it sound like a minor improvement to the Pentium for business/servers was actually a completely new architecture -- is where they introduced the CISC->RISC conversion. This was in part to make it feasible to have out-of-order execution which many said CISC processors would never have. Turns out they were both right and wrong.

So let's stick with x86 for now, since the gains you foresee are either non-existent or tiny and are never, ever going to outweigh the drawbacks.

As much as I hate x86 from an aesthetic point of view, I must agree with you here.

Re:I wish AMD and Intel teamed up for once (2, Informative)

Vellmont (569020) | more than 6 years ago | (#20241427)


and did away with the aging x86 instruction set and came up with something new.

They did, at least with the FP (floating point) instructions. FP instructions were based on this awful stack architecture, and it's gone away with all the SSE and 64 bit extensions.

The x86 instruction set has evolved greatly over time, and will continue to evolve. Why replace it entirely from scratch? Who's to say that an entirely new instruction set won't have a whole new host of problems?

Re:I wish AMD and Intel teamed up for once (4, Insightful)

LWATCDR (28044) | more than 6 years ago | (#20241437)

Well we had the 68000 family which had much better instruction set then the X86.
We have the Power and PowerPC which had a much better instruction set than the X86.
We have the ARM which is a much better instruction set then the X86.
We have the MIPS which is pretty nice.
And we had the Alpha and still do for a little while longer.
The problem with all of them is that they didn't run X86 code. Intel and AMD both made so much money from selling billions of CPUs that they could plow a lot of money into making the X86 the fastest pig with lipstick that the world has ever seen.
What made the IA-64 such a disaster was that it was slow running X86 code.

Re:I wish AMD and Intel teamed up for once (1)

ZachPruckowski (918562) | more than 6 years ago | (#20241911)

I don't know why you aren't modded +5 (at the moment anyway), but you're precisely correct.

The number one requirement for a new instruction set is that it runs Windows and most Win32 programs at speeds comparable to existing processors. Given the size and scope of Windows, Microsoft probably can't easily port Windows and Win32 and Visual Studio's compiler over to another instruction set easily.

This means that we either need hardware or software emulation of x86 (and possibly x86-64) on whatever new instruction set comes along. So it either has to support x86 and most x86 extension (SSE, etc), in which case it's an oversized x86 extension, or it has to be so much better than x86 that a processor can run x86 programs at about 80% speed. In either case, you'll still have a heck of a time getting non-OSS software ported to the new instruction set (as x86 will be "fast enough")

Re:I wish AMD and Intel teamed up for once (5, Insightful)

Criffer (842645) | more than 6 years ago | (#20242079)

Not again.

Why is this nonsense still perpetuated? The instruction set is irrelevant - it's just an interface to tell the processor what to do. Internally, Barcelona is a very nice RISC core capable of doing so many things at once its insane. The only thing that performs better is a GPU, and that's only because they're thrown at embarassingly parallel problems. The fastest general purpose CPUs come from Intel and AMD, and it has nothing to do with instruction set.

AMD64, and the new Core2 and Barcelona chips are very nice chips. 16 64-bit registers, 16 128-bit registers, complete IEEE-754 floating point support, integer and floating-point SIMD instructions, out-of-order execution, streaming stores and hardware prefetch. Add to that multiple cores with very fast busses, massive caches - with multichip cache coherency - and the ability to run any code compiled in the last 25 years. What's not to like?

Re:I wish AMD and Intel teamed up for once (2, Insightful)

Chirs (87576) | more than 6 years ago | (#20242493)

The instruction set *is* relevent to low-level designers. Working with the PowerPC instruction set is much nicer than x86...for me at least.

As for "the fastest general purpose CPUs come from Intel and AMD", have you ever looked at a Power5? It's stupid fast. Stupid expensive, too.

Re:I wish AMD and Intel teamed up for once (0)

Anonymous Coward | more than 6 years ago | (#20242835)

How the fuck do you get off saying that bloody 16 integer registers and 16 double registers makes for a "very nice" chip? PowerPC chips have had 32 of each for a long, long time. Itanium at least upped that number, but we saw where that went?

Re:I wish AMD and Intel teamed up for once (2, Insightful)

Verte (1053342) | more than 6 years ago | (#20243201)

AMEN! The lack of general purpose registers is a serious drawback to x86. The MMU is the same- well, it's not that it isn't feature packed, but it's so slow that we need a TLB, and the TLB can't handle threads, so all non-globals need to be flushed when context switching. Yuck.

All the other features the GP mentioned, except for the last one if you mean COMPILED code, are also available on most RISC chips :P and the performance data really spoke for itself [Alphas had four times the floating point performance of the Pentium II clock for clock].

Re:I wish AMD and Intel teamed up for once (4, Interesting)

Chris Burke (6130) | more than 6 years ago | (#20242897)

Why is this nonsense still perpetuated? The instruction set is irrelevant - it's just an interface to tell the processor what to do.

Sure, now it is, since the decoding of CISC instructions into micro-ops has largely decoupled ISA from the microarchitecture, allowing many of those neat-o performance features you meantion like out-of-order execution. However in the past this wasn't the case and a lot of x86's odd behaviors that seemed like good ideas when they were made were serious performance limiters. Like a global eflags register that is only partially written by various instructions (and they always write even if the result isn't needed).

Even today, I would say that all those RISC ISAs are better than x86, simply from the standpoint that they are cleaner, easier to decode, have fewer tricky modes to deal with, fewer odd dependencies, and all the other things that make building an actual x86 chip a pain in the arse. No, in the end it makes no difference in performance. Yet, if you had it to do all over again, building the One ISA to Rule Them All without concern for software compatability, and you decided to make something that was more like x86 than Alpha, I'd slap the taste out of your mouth.

But we do have to be concerned with software compatability, and that I think was the GP's main point. All of those other ISAs failed to dominate -- even when there were actual performance implications! -- simply because they were not x86 and hence didn't run the majority of software. IA64 failed not because it was itself all that bad, but because it couldn't run x86 software well. So when AMD came out with 64-bit backward-compatible x86, everyone stopped caring about IA64. Because it wasn't x86, and AMD64 was.

So ultimately I agree with you both, and I don't think the GP was nonsense at all. It's a very valid point -- backward compatability is king, so x86 wins by default no matter what. Your point -- that x86 isn't actually hurting us anymore -- is just the silver lining on that cloud.

Re:I wish AMD and Intel teamed up for once (0)

Anonymous Coward | more than 6 years ago | (#20243197)

[blockquote]What's not to like?[/blockquote]

The fact that it is not available in the marketplace now, and the very overclockable Q6600 is out there with a new low price of A$330???

Re:I wish AMD and Intel teamed up for once (2, Insightful)

wonkavader (605434) | more than 6 years ago | (#20242249)

No, the problem with the IA-64 was not that it was slow running x86 code. The problem was that it was slow running x86 code and not that great at running non-x86 code. Spectacular performance on non-x86 would have made it a much greater success, but it was lackluster from the start. After so long spent on designing a new chip, you'd expect some real results -- it was not much better than the alternatives. "Why bother?", the world said, and says even now.

Re:I wish AMD and Intel teamed up for once (0)

Anonymous Coward | more than 6 years ago | (#20242511)

We have the ARM which is a much better instruction set then the X86.

Sure, if you want to stick to in-order execution. As soon as you try going out of order, the implicit dependencies caused by the conditional-execution bits make it incredibly painful.

Re:I wish AMD and Intel teamed up for once (1)

glitch23 (557124) | more than 6 years ago | (#20242607)

What made the IA-64 such a disaster was that it was slow running X86 code.

IA-64 did x86 in hardware only because the instruction set did not support x86. So not only was it not supported at all in software but the support that was there was slow. That was its downfall. By retaining x86 compatibility with its 64 bit CPUs, AMD was able to jump into the 64-bit world with a better reception.

Re:I wish AMD and Intel teamed up for once (1)

nbert (785663) | more than 6 years ago | (#20241465)

Not that it would make much of a difference - in the end most of the instruction set won't be used by programmers and especially compilers (CISC vs. RISC anyone?). But to get back to the topic: The overhead caused by upwards compatibility isn't that big after all. Problems a normal user experiences are not caused by bad hardware design nowadays.

Re:I wish AMD and Intel teamed up for once (0)

Anonymous Coward | more than 6 years ago | (#20241479)

No. The software community *doesn't* wish that someone created a ground up new ISA. They're created all the time by upstart embedded companies and ivory tower professors. But the fact is, everyone knows, understands, and has learned to love x86. It's an entrenched standard that with trillions of dollars of infrastructure built around it. AMD and Intel teaming up to make a new ISA would result in AMD and Intel both making an interesting product with no infrastructure support doomed to limited niche markets. See Itanium, Alpha, PA-RISC, even PowerPC.

Coward out.

Re:I wish AMD and Intel teamed up for once (1)

Dusty00 (1106595) | more than 6 years ago | (#20241497)

And Intel's failure was due to a lack of backwards compatability. Coming up with "radically new and advanced" architecture does little good in the tech world because no matter how much better it is going forward it has to still work with the technologies that got us here.
If someone invents something better by than HTML it won't matter how much better it is, the world isn't going to scrap the content on the internet for the sake of the new technology.

Re:I wish AMD and Intel teamed up for once (1)

Crazy Taco (1083423) | more than 6 years ago | (#20242425)

If someone invents something better by than HTML it won't matter how much better it is, the world isn't going to scrap the content on the internet for the sake of the new technology.

HTML 5.0 even being considered is a case in point, considering XHTML is far better from a Computer Science standpoint and has far more future potential.

Coming up with "radically new and advanced" architecture does little good in the tech world because no matter how much better it is going forward it has to still work with the technologies that got us here.

And even if we didn't have to keep the old tech, or even if the change to the new tech was really, really slight and easy, we still wouldn't make it because we would still have many, many idiots around who will refuse to learn something new or even consider a new technology. In fact, they will raise a stink for years until someone relents. It doesn't matter how much better the new technology is, how bad things were before, or how poor or illogical the arguments of those wanting to keep the old technology are: they will badger everyone until it is resurrected. Again, HTML 5 is a case in point.

Re:I wish AMD and Intel teamed up for once (1)

Ant P. (974313) | more than 6 years ago | (#20241557)

The thing is, what would they replace it with that they can sell? The only choices are emulation or translating code on the fly, both of which have sunk already.

Re:I wish AMD and Intel teamed up for once (2, Insightful)

servognome (738846) | more than 6 years ago | (#20243373)

and did away with the aging x86 instruction set and came up with something new.
I wish they'd do away with English and come up with something new - a language based on consistant & logical rules.
I don't know how anything gets done using a set of words cobbled together over hundreds of years with all sorts of special rules and idioms.

Nice, but let's get Barcelona out the door, OK? (1, Interesting)

Anonymous Coward | more than 6 years ago | (#20241289)

These extensions could be useful, but speaking as someone from the target audience... I just don't care right now. No amount of minor improvement difference (as might be gained through these) is as important to me as seeing a viable alternative to Intel. Not because I'm an AMD fanboy, but because competition brings the prices down, and accelerates the release of faster chips. From what I hear now, we'll finally see Barcelona chips out on September 10th at -maybe- up to 2.3 Ghz if you're one of the cherised few, but most retail ones will be 1.9 Ghz. I haven't seen the (valid) numbers, so I can't say for sure, but I'm worried about how competitive this will be.

I realize that the software people and hardware people both have their projects to work on, and they work largely independently in terms of a time-frame, but I figure this news might be timed to say, "Hey! Look at us! We're doing stuff!", but it only serves to frustrate me that their still aren't any real numbers on Barcelona, and, on the whole, that AMD seems to have dropped the ball. /Grumble

Re:Nice, but let's get Barcelona out the door, OK? (1, Interesting)

Pojut (1027544) | more than 6 years ago | (#20241543)

What I would like to know is how is it that AMD got it's ass handed to itself so viciously by Intel with the Core 2, and yet STILL isn't even remotely close to having something that can compete?

AMD was "winning" for quite a long time...what happend that has made it impossible for them to come up with something even mildly exciting?

Re:Nice, but let's get Barcelona out the door, OK? (1, Insightful)

Anonymous Coward | more than 6 years ago | (#20241681)

A new microarchitecture is not something you bang together in a weekend. And from the looks of it, no amount of incremental tweaks to the K8 microarchitecture would be enough to catch up to Core. Barcelona was most likely in development well before Core hit the streets.

You may remember that K8 gave Netburst a similar drubbing, and yet Intel continued on with Netburst for everything but its laptop products for some time. Core has now been on the market for just over a year.

Re:Nice, but let's get Barcelona out the door, OK? (3, Insightful)

HandsOnFire (1059486) | more than 6 years ago | (#20241713)

What happened is that the P4 architecture was more of a marketing scheme to push MHz, but not performance. AMD came out with an architecture directed at high performance. Intel came out with the Core 2 products which also focused on peroformance instead of clock speed. Intel has a lead in the manufacturing process side with respect to node size. This helps them to produce a lot at a lower cost. And If you look at Intel's and AMD's financials, you'll see how much each has to spend on R&D. Intel has a lot more money to put down on more designs and more engineers than AMD does.

Re:Nice, but let's get Barcelona out the door, OK? (2)

Surt (22457) | more than 6 years ago | (#20241973)

Major architectural changes (historically) have been years between. AMD had the lead arch, and intel took years to respond with core. Now intel has the lead, and AMD won't compete until their new arch. The problem is compounded for AMD by intel deciding to make a major push to speed up their arch cycle time. AMD's new arch will have to do battle with intel's refined core2 shortly after release, and intel's next arch is due as soon as next year, so their window is tight. AMD is of course also trying to accelerate their cycle, but intel has a lot more money to spend on this battle.

Re:Nice, but let's get Barcelona out the door, OK? (1)

GiMP (10923) | more than 6 years ago | (#20242177)

What I would like to know is how is it that AMD got it's ass handed to itself so viciously by Intel with the Core 2, and yet STILL isn't even remotely close to having something that can compete?


As I see it... The memory bandwidth limitations on Intel's FSB are so restricting that for many applications it matters little how many cores or threads their CPUS can push. The reality is that Intel's chips cannot push memory around fast enough for those processors to be worthwhile. Rather than a dual quad-core system with Intel processors, get a quad dual-core system with AMD processors. You still get 8-cores, but you also get a whole lot more memory bandwidth.

Re:Nice, but let's get Barcelona out the door, OK? (1)

imgod2u (812837) | more than 6 years ago | (#20242379)

See, with single-die, multi-core solutions, the FSB becomes much less of a limitation. A smart caching system pretty much does away with most of the problems with the exception of streaming programs (like pixel processing).

Looking at the Core 2's memory bandwidth compared to that of an X-2, it doesn't seem like effectively latency/bandwidth is all that lacking.

As you scale to 8+ chips with separate cache pools, the difference will show, however.

Also, keep in mind that in a NUMA architecture, you don't have one big chunk of memory to do what you will with. You have pockets of memory and if the OS/application isn't smart enough to partition its data chunks (or if two threads share a single, fragmented data chunk and there's no replication), then you're not effectively getting more bandwidth. In fact, it will slow you down as you'd have to go over the core-to-core interconnect (high-latency Hypertransport link in the case of the AMD64's) to get to the memory you want.

Will Intel Adopt These Instructions? (1)

Apple Acolyte (517892) | more than 6 years ago | (#20241425)

Has there in the past been an example of AMD adding new instructions and then Intel following along and adopting them? I know it works in the converse, but somehow I doubt Intel wants AMD taking the lead in extending its own ISA.

Re:Will Intel Adopt These Instructions? (3, Informative)

The Real Nem (793299) | more than 6 years ago | (#20241493)

EM64T [wikipedia.org] ?

Re:Will Intel Adopt These Instructions? (1)

Anarke_Incarnate (733529) | more than 6 years ago | (#20241723)

Technically correct and wrong at the same time. EM64T has a kludge in the way that memory is addressed. The EM64T chips cannot access memory above 4GB without using pointers.

Re:Will Intel Adopt These Instructions? (1)

edwdig (47888) | more than 6 years ago | (#20242135)

Technically correct and wrong at the same time. EM64T has a kludge in the way that memory is addressed. The EM64T chips cannot access memory above 4GB without using pointers.

You can't access any memory without pointers.

You're probably thinking of Page Addressing Extensions (PAE), which let you swap out parts of the page tables to point to memory above 4 GB. That's existed since the Pentium Pro or so. EM64T is just the damage control name Intel's marketing department came up with for their implementation of AMD64.

Re:Will Intel Adopt These Instructions? (1)

Anarke_Incarnate (733529) | more than 6 years ago | (#20242289)

No, this is not regarding PAE. PAE should be irrelevant with 64bit extensions since it should be able to address over 32bits worth of RAM. PAE was for older generation processors. The issue is that the EM64T spec does not change the addressable amounts of RAM. I wish I had the link from Red Hat about the kernel hacks that were needed to make it work.

By the way, you do not need pointers to address memory, and what I had stated was that in order to address higher than 4GB of RAM, the EM64T chips have a kludge that remaps the higher memory to lower memory via pointers.

Re:Will Intel Adopt These Instructions? (1)

andreyw (798182) | more than 6 years ago | (#20242735)

I don't know what you're smoking, but I want some of it.

Let's start with some basic facts, that you can verify for yourself by hitting the long mode specs in AMD and Intel manuals:
1) You need PAE enabled (in CR4). Long mode uses a 4-level paging table scheme (PML4 - PDPT - PD - PT, although you can get away with only using the first three levels if you are fine with a 2MB granularity.
2) The linear address space is 64 bits.
3) The physical address space, ATM AFAIK, is 52 bits, with the other bits reserved for now. Going beyond 52-bits will likely need a PML5).
4) All registers are extended to 64-bit length, there are 8 new general purpose registers registers.
5) I am going to re-iterate - your address space is 64-bits. Your addressable memory is 2^64 - 1. Unlike PAE, where your linear address space is still 32 bits, you do not need an aperture within your linear 4GB to access physical addresses > 4GB.

I have no clue what the hell you talk about when you talk about "pointers", which are a software language concept. On EM64T/AMD64 you can perform direct and indirect MOVs to and from your entire linear (i.e. virtual) address space - and thus, through the "wonder" of paging (which you need enabled to enter long mode in the first place) - to and from your entire physical address space.

If you want a tiny piece of advice - instead of half-understanding mailing-list threads and articles written by people who know what they're talking about TO people who know what they're talking about - just hit the specs. They're free. Shit dude, if you acually bothered to try some 64-bit programming (even at the user, much less system, level) you would see that what you just wrote is just plain wrong.

Since this is Slashdot, I'll even give you links to the specs -
1) http://www.intel.com/products/processor/manuals/in dex.htm [intel.com]
2) http://www.amd.com/us-en/Processors/DevelopWithAMD /0,,30_2252_11467_11513,00.html [amd.com]

Re:Will Intel Adopt These Instructions? (1)

Anarke_Incarnate (733529) | more than 6 years ago | (#20243207)

I simplified the fucking explanation due to being tired, sue me.

https://www.redhat.com/docs/manuals/enterprise/RHE L-3-Manual/release-notes/as-amd64/RELEASE-NOTES-U2 -x86_64-en.html [redhat.com]

From the reference itself
" Software IOTLB -- Intel® EM64T does not support an IOMMU in hardware while AMD64 processors do. This means that physical addresses above 4GB (32 bits) cannot reliably be the source or destination of DMA operations. Therefore, the Red Hat Enterprise Linux 3 Update 2 kernel "bounces" all DMA operations to or from physical addresses above 4GB to buffers that the kernel pre-allocated below 4GB at boot time. This is likely to result in lower performance for IO-intensive workloads for Intel® EM64T as compared to AMD64 processors.

  Lack of 3DNow!(TM) instructions: -- Intel® EM64T does not recognize the prefetch and prefetchw instructions while AMD64 processors do. The Red Hat Enterprise Linux 3 Update 2 kernel excludes these instructions in both C and assembly language code and therefore will suffer a small amount of performance degradation."

Re:Will Intel Adopt These Instructions? (1)

forkazoo (138186) | more than 6 years ago | (#20242919)

Technically correct and wrong at the same time. EM64T has a kludge in the way that memory is addressed. The EM64T chips cannot access memory above 4GB without using pointers.


Could you clarify that at all? I'm not the end-all, be-all expert on these things, but I do know enough to be sure that what you wrote is so not-correct as to not even be wrong...

Pointers really only matter from a relatively high-level software perspective. From a low level hardware perspective, you can either say that pointers don't exist, or that all memory is accessed via pointers. The distinction of pointer vs. some other conceptual model for accessing data (such as Java) just doesn't exist at that level. Consequently, talking about needing to use pointers to access memory above 4 GB is like trying to decide if a Senator from Alaska rambling about the Internet smells more like cyan of yellow. You can certainly say he doesn't smell very yellow, and be somewhat correct, but the statement carries no information.

Re:Will Intel Adopt These Instructions? (0)

Anonymous Coward | more than 6 years ago | (#20241501)

How about the AMD64 ISA.

Re:Will Intel Adopt These Instructions? (1)

SEE (7681) | more than 6 years ago | (#20242205)

x86-64 (AMD64) is the classic case.

Prior to that, the closest thing was when NexGen (just before AMD bought them) developed an MMX-like extension for the Nx686 (released by AMD as the K6) and cut a deal for Cyrix to use them, which is what provoked Intel into creating MMX with cross-licensing to AMD and Cyrix.

Re:Will Intel Adopt These Instructions? (1)

Necroman (61604) | more than 6 years ago | (#20242445)

Intel and AMD have some nice agreements between one another where they are allowed to share information about x86 processor extensions and the like. This means if one company designs a cool new extension, the other can pick it up with little hassle.

(Or at least that's how I remember it working)

good (0, Flamebait)

thatskinnyguy (1129515) | more than 6 years ago | (#20241499)

"... they could be used to accelerate Java, .Net, and dynamic optimizers."
About 80% of all Java-based apps I've run across could use all the help they can get in the speed department. Robust platform... Just the speed isn't quite there.

just about time (1)

yoprst (944706) | more than 6 years ago | (#20241569)

I never quite understood why chip manufacturers had added cores long after memory bandwidth had became a problem. Why not add specialized execution units and make instruction set a bit fatter? It's not like arithmetic and logic operations are all that you can do with an int or a few ints. Same for floats (but even more operations).

Re:just about time (0)

Anonymous Coward | more than 6 years ago | (#20242173)

x86 is full of specialized instructions.

Hardware Accelleration == Bad Trend (-1, Troll)

Anonymous Coward | more than 6 years ago | (#20241803)

Yet another waste of silicon to 'accellerate' badly written software.

Instead of devoting transistors to speed up the latest toy programming languages ('managed' code), why can't we just train programmers better?

Ahh..of course, because of java..don't bother learning HOW to optimized, let java do it FOR you...

Re:Hardware Accelleration == Bad Trend (1)

ZakuSage (874456) | more than 6 years ago | (#20242405)

You can poo-poo Java all you want, but the reality is that it's made programming a lot easier for the "rest of us", especially in a world where cross platform compatibility is key.

Re:Hardware Accelleration == Bad Trend (0)

Anonymous Coward | more than 6 years ago | (#20242483)

Ah, yes, the 'rest of us', meaning to me, the mediocre programmers, or 'Code Monkeys'. Please, by all means continue to churn out steaming mounds of code.

It's cross-platform alright, but crap is crap, on any platform, and in any language.

Re:Hardware Accelleration == Bad Trend (1)

Chris Burke (6130) | more than 6 years ago | (#20242453)

Yet another waste of silicon to 'accellerate' badly written software.

AND well-written software. What, you think you could write code that's just as fast without all the "hardware acceleration" being done for you, without using any instruction set extensions that have been added over the years? You are on crack.

Instead of devoting transistors to speed up the latest toy programming languages ('managed' code), why can't we just train programmers better?

And better profiling tools are contrary to this goal how exactly? And at what point do you tell your better-trained programmers that using those hardware acceleration features will make their code go faster?

Ahh..of course, because of java..don't bother learning HOW to optimized, let java do it FOR you...

Or let your C compiler do it for you. Whichever. There's a matter of degree, to be sure, but even still you're most likely wasting your time "optimizing" individual lines of C code since the compiler can probably do a better job and that's been the case for quite a while. The thing that will get you the most bang for buck is the same in C as it is in Java -- optimize your algorithms. Java can't do that for you, and neither can your C compiler.

Re:Hardware Accelleration == Bad Trend (1)

glitch23 (557124) | more than 6 years ago | (#20242739)

It isn't Intel's job to train programmers to do things right. That is the responsiblity of the education system. Nothing stops the education system from still teaching proper programming and design skills.

Re:Hardware Accelleration == Bad Trend (1)

shankarunni (1002529) | more than 6 years ago | (#20242795)

Yet another waste of silicon to 'accellerate' badly written software.

Instead of devoting transistors to speed up the latest toy programming languages ('managed' code), why can't we just train programmers better?

Ahh..of course, because of java..don't bother learning HOW to optimized, let java do it FOR you...

I'm tempted to slam this as an uneducated rant, but since there's a little teeny kernel of truth in it, I'll let it slide.

The issue is not "badly written code". It's being able to run the same compiled code on a wide variety of hardware without recompiling it for every chip variant.

The huge drawback with all the RISC architectures (at least initially) was that each version of each chip had different numbers of functional units, different latencies for the functional units, different latencies to cache and memory, etc.

If you ever dealt with the MIPS or Sun compilers, they have a huge number of flags for hyper-optimizations on a variety of implementations of those architectures. The problem is that when you optimize it for one variant, it often makes it worse on other variants (because instructions that didn't collide in the instruction pipeline now do, as just one example..)

Now all of the modern architectures play the same games. Power/PowerPC, SPARC, Itanium, all of them. They all have multiple pipelines and execution units, massively parallel instruction issue, etc. Just like the X86.

And it's not because the programmers are idiots, but because that's the only way you could ever ship one binary that would run "optimally" on every implementation of that architecture.

PS. Java and C++ only make this worse because they are so dependent on such out-of-order massively-parallel execution (since they are so darn difficult to statically optimize).

The supreme irony of this is that for a while there, Java on X86 (Sun's implementation, no less!) ran rings around Java on SPARC (great strategy for pulling in customers for SPARC !). It's only with recent SPARC implentations (Niagara/Niagara 2) that play the same way as the X86's, that SPARC has finally caught up with and passed X86 again..

Is it really that hard? (0)

Anonymous Coward | more than 6 years ago | (#20241835)

I see all fuss about programming. easy. don't what the is parallel It's
I see all fuss about programming. easy. don't what the is parallel It's

all over the news? really? (0)

Anonymous Coward | more than 6 years ago | (#20241859)

"It has been all over the news today:". Really? The only AMD news I've been seeing all day has been "Barcelona not shipping on schedule, and parts won't be as fast as promised". Ooops. Well, those Core2's are still cheap. and faster.

Linus is right (-1, Troll)

Anonymous Coward | more than 6 years ago | (#20241877)

I am with Linus on this one. For the life of me I can't understand what this sucking up to RMS is about. Linus himself does not think GPLv3 is a good thing. So why do people keep adopting it.
Without Linus FOSS is tossed. Not following Linus is dangerous for the survival of FOSS.

Nothing special for Java or .NET (0, Troll)

dnoyeb (547705) | more than 6 years ago | (#20242365)

That must have been speculation or a SWAG from the poster to suggest it could be used to accelerate Java and/or .NET. There is nothing special about java or net that would allow this optimization. Both run on top of the OS and not on top of the hardware. So if the OS provided similar information about its routines, then that could be used. As it stands, the only thing to accelerate Java or .NET (both of which are c/c++ programs) is something that would accelerate any c/c++ program running on top of an OS.

Re:Nothing special for Java or .NET (4, Insightful)

Wesley Felter (138342) | more than 6 years ago | (#20242451)

Performance counters could be used by JITs to generate more optimized code. I wonder which programming languages use JITs...

Re:Nothing special for Java or .NET (0)

Anonymous Coward | more than 6 years ago | (#20242663)

That assertion is in the original press release. But you're correct, there's no reason you couldn't use light-weight profiling in any other language. What makes it appealing for interpreted / JITed languages, though, is that the original program doesn't need to be aware of this - it's up to the JVM or .NET Framework to make your application fast for you. It'd probably be infeasible for a C compiler to spit out binaries that dynamically optimize themselves without you, the programmer, being aware of it.

Re:Nothing special for Java or .NET (1)

jhol13 (1087781) | more than 6 years ago | (#20243193)

There already are systems which do exactly that (optimise dynamically C programs), see http://arstechnica.com/reviews/1q00/dynamo/dynamo- 1.html [arstechnica.com] .

Of course HotSpot-like JIT'd languages are "easiest" target and most likely gives the biggest performance improvement. After all, HotSpot does partially (in SW) what the proposal does (in HW).

Re:Nothing special for Java or .NET (1)

TheNetAvenger (624455) | more than 6 years ago | (#20243347)

That must have been speculation or a SWAG from the poster to suggest it could be used to accelerate Java and/or .NET. There is nothing special about java or net that would allow this optimization.

Ok, sorry, wrong, and yes, wrong again...

The notes about .NET and JAVA come specifically from AMD themselves.

The reason it would benefits these environments is because they are processed on the fly and the environment could make the 'adjustments' to the code at runtime instead of it be 'locked' as natively compiled code is.

This is level 101 understanding and logic here, not sure how you are missing this.

Side Channels (1)

DanLake (543142) | more than 6 years ago | (#20242641)

"They would give software access to information about cache misses..." Yeah that ought to help significantly with side-channel attacks against crypto software.

Re:Side Channels (1)

gnasher719 (869701) | more than 6 years ago | (#20242915)

>> "They would give software access to information about cache misses..." Yeah that ought to help significantly with side-channel attacks against crypto software.

I think you didn't read the spec. All that information is only available to the thread that is profiled; everything is context-switched so it can't leak out to other threads and definitely not to other processes.
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...