Beta

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

First 16-Core Opteron Chips Arrive From AMD

timothy posted more than 2 years ago | from the one-for-each-candle dept.

AMD 189

angry tapir writes "After a brief delay and more than a year of chatter, Advanced Micro Devices has announced the availability of its first 16-core Opteron server chips, which pack the largest number of cores available on x86 chips today. The new Opteron 6200 chips, code-named Interlagos, are 25 per cent to 30 per cent faster than their predecessors, the 12-core Opteron 6100 chips, according to AMD."

cancel ×

189 comments

Sorry! There are no comments related to the filter you selected.

Only 16? (0)

masternerdguy (2468142) | more than 2 years ago | (#38050798)

Pfft, how much harder can it be to design one with 32 :)

Re:Only 16? (1)

unity100 (970058) | more than 2 years ago | (#38050846)

20 next year, 24 the next, and so on.

Re:Only 16? (4, Interesting)

ackthpt (218170) | more than 2 years ago | (#38050942)

Pfft, how much harder can it be to design one with 32 :)

Design? Easy.

Manufacture? Tricky.

Make work? Trickier.

To read about? Interesting.

Re:Only 16? (1)

unixisc (2429386) | more than 2 years ago | (#38051092)

lesser incremental value? Even more difficult!

Re:Only 16? (1)

sjames (1099) | more than 2 years ago | (#38051060)

So what are you waiting for? Hop to it and corner the market!

Go ahead, I'll just wait over here and read the paper.

Re:Only 16? (1, Informative)

beelsebob (529313) | more than 2 years ago | (#38051340)

Pffft, it's only 8 cores anyway, 8 cores each with 2 integer units. It's no more 16 core than intel's 8 cores with hyperthreading.

Re:Only 16? (1)

jessehager (713802) | more than 2 years ago | (#38051474)

It's 8 cores per chip *and 2 chips per package* for a total of 16 cores.

Re:Only 16? (3, Informative)

beelsebob (529313) | more than 2 years ago | (#38051488)

No, 8 integer cores per chip, but 4 actual real cores. For a total of 8 cores across 2 chips.

Re:Only 16? (5, Insightful)

afidel (530433) | more than 2 years ago | (#38051720)

No, there are 16 integer pipelines with one scheduler and 4 logic units each, 16 128bit floating point units that can also be combined into 8 256bit units, and 8 fetch/decode units. This is not a MCU, it's one chip with the above mentioned components. Whether it's 16 cores or 8 or 4 modules is kind of academic unless you are trying to optimize a scheduler for it in which case the label's still don't matter, only the actual implementation and achievable performance matter.

Re:Only 16? (4, Interesting)

beelsebob (529313) | more than 2 years ago | (#38051790)

The basic point is that it has a total of 8 instruction fetch units, it has a total of 8 instruction decode units that they feed, and it has a total of 8 chunks of L2 cache. The fact that each of these 8 cores has 2 integer units on it is neither here nor there –hell, for years cores have had several floating point units on them, it didn't make them more than one core. Not only that, but this CPU behaves badly when the scheduler treats it as 16 cores instead of 8. The bottom line is that this chip in every single way behaves like an 8 core CPU, more so, it's slower than intel's 8 core CPUs at a similar clock even with hyper threading disabled.

Re:Only 16? (2)

Vancorps (746090) | more than 2 years ago | (#38052074)

What are you basing this on? As someone that runs both database and web servers using both AMD and Intel I find your conclusions to be completely counter to my experience and to the experience of almost everyone I know that does virtualized infrastructure.

I ran into a number of problems when I first tried to deploy them because SQL 2005 wouldn't install on it. SQL 2008 runs just great with 24 cores as they were dual processor 12 core servers. I have no reason to think the 16 cores variants would be much different.

Re:Only 16? (4, Funny)

kvvbassboy (2010962) | more than 2 years ago | (#38051504)

Pic related: amd vs intel decision making [chanarchive.com] .

Re:Only 16? (1)

Killjoy_NL (719667) | more than 2 years ago | (#38051816)

pffff why the troll mod, it's funny and on topic :)
probably not very accurate, but still quite enjoyable :)

Re:Only 16? (0)

Anonymous Coward | more than 2 years ago | (#38052914)

Well, I got modded insightful after that, which was the last thing I expected or even intended. ;) And you are right, it was just a joke.

Re:Only 16? (1)

Chrisq (894406) | more than 2 years ago | (#38052320)

Pfft, how much harder can it be to design one with 32 :)

To run at the same speed - very difficult. Think about twice the heat unless you make major changes

Compared to Intel? (2)

Ed Avis (5917) | more than 2 years ago | (#38050816)

So... how do these compare to the new Sandy Bridge chips Intel announced on the same day? There must be some overlap of the target market - whether to buy a quad-socket Intel server or dual-socket AMD one, for example.

Re:Compared to Intel? (1)

Anonymous Coward | more than 2 years ago | (#38050834)

The Sandy Bridge chips released so far are all "Extreme" versions which suck power so much you'd be insane to use them for a server.

Re:Compared to Intel? (2)

0123456 (636235) | more than 2 years ago | (#38050948)

Given that an 8-core Bulldozer already needs its own power station to operate, I can't imagine Intel could have a worse TDP than a 16-core.

Re:Compared to Intel? (1)

unity100 (970058) | more than 2 years ago | (#38051088)

intel cant field more than 6 cores at the same time in even sandy bridge E. multithreaded apps like server apps, shine in bulldozer.

Re:Compared to Intel? (5, Informative)

the linux geek (799780) | more than 2 years ago | (#38051264)

Intel's server chips are 8- and 10-core, and outperform Opterons by a considerable margin.

Re:Compared to Intel? (0)

Anonymous Coward | more than 2 years ago | (#38051986)

Source?

Re:Compared to Intel? (1)

the linux geek (799780) | more than 2 years ago | (#38052084)

SPECcpu results, TPC-H, and personal experience.

Re:Compared to Intel? (2, Interesting)

gilboad (986599) | more than 2 years ago | (#38052162)

While I do agree that AMD is *well* behind Intel's latest and greatest in the 1P / desktop world, I fail to see how you could make such bold statement, unless you have had the chance to compare and AMD 4S machine to Intel 4S machine (say, Opteron 62xx based HP DL585G7 vs. Xeon 75xx/E7 based HP DL580G7).

In my experience (and I venture and guess that is just as good as yours, if not better) the picture is far from being black-and-white and greatly (!!!) depends on the application that is being tested. The pictures becomes even more complex, once you factor in the Xeon E7 excessive price. ... So I ask again, have you had any experience in benchmarking the Opteron 6200 or are you simply making things up as you go along?

- Gilboa

Re:Compared to Intel? (4, Interesting)

beelsebob (529313) | more than 2 years ago | (#38051400)

What's the Xeon E5-2650L, 2650, 2660, 2665, 2670, 2680, 2690 and 2687W then?

Hint: they're all 8 core SNB-E chips. Second hint - AMD's 16 "core" CPUs don't have 16 cores – they have 16 integer units. They only have 8 instruction fetch units, 8 decode units, 8 L2 caches, etc. That is, they're 8 core CPUs with strong integer support. SNB-E's particular strength is floating point, but it tends to beat the opterons at pretty much anything that isn't heavily integer biased.

Re:Compared to Intel? (2)

Vancorps (746090) | more than 2 years ago | (#38052142)

Which would be what? I sounds to me like databases and webservers benefit greatly from the AMD approach. Alternatives such as render farms use GPUs, so what strength is Intel actually offering?

Re:Compared to Intel? (2)

beelsebob (529313) | more than 2 years ago | (#38052154)

Not really, no –databases and web servers don't spend their time doing parallel integer work, they spend their time doing logic work. Sandy Bridge kicks the snot out of it there.

Re:Compared to Intel? (1)

Luyseyal (3154) | more than 2 years ago | (#38052924)

If the logic is parallelizable, then the AMD chips could be a good choice. A webserver would be a good example of parallel logic in run-of-the-mill software were it not hampered by all that pesky I/O.

-l

Re:Compared to Intel? (1)

bloodhawk (813939) | more than 2 years ago | (#38051806)

You need to find a better line for such fanboyish, using stuff that is easily known and proven wrong is just silly. Intel server lines are 8 and 10 core. So far they have also trounced AMD in performance, though it would be nice if AMD can edge closer or even pass them with something new. competition is much needed in this area and something intel has not had for a few years now.

Re:Compared to Intel? (4, Interesting)

unity100 (970058) | more than 2 years ago | (#38051916)

is that why there have been 3 supercomputer orders in the last 3 weeks with amd's bulldozer opterons ?

sandy bridge ep 95W (3, Informative)

Chirs (87576) | more than 2 years ago | (#38051100)

There will be server versions as well...I've seen specs (publicly available) for an 8-core (16-thread) sandy bridge EP with a 95W TDP. I suspect it's clocked a bit lower and maybe binned for efficiency.

Re:Compared to Intel? (2)

Kjella (173770) | more than 2 years ago | (#38051296)

Even the fastest Sandy Bridge-E draws less power than a Bulldozer even at much higher performance. It also costs 3-4 times as much, so performance/$ is quite shitty (hey, it's an extreme $999 proc) but you the winner in performance is clear [pcper.com] . But thanks for trolling, come again.

Re:Compared to Intel? (2)

beelsebob (529313) | more than 2 years ago | (#38051968)

Really? Given that an 8 "core" bulldozer FX-8150 gets beaten by a 4 core i5 2500, you would reasonably expect that this 16 "core" bulldozer would get beaten by an 8 core sandy bridge chip with no hyperthreading at roughly the same clock speed. A little bit of imagination might convince you that a 6 core with hyprethreading might perform similarly too.

AMD – 16 "core" bulldozer – $1000
Intel – 6 core + HT Xeon E5-1650 at much higher clock – $583.
Alternatively, if you want to be able to stick the intel chips in NUMA
Intel – 6 core + HT Xeon E5-2640 at the same clock as the AMD chip – $884, but with only 95W power consumption.

Final alternative:
Intel –4 core (with no HT) Xeon E5-2609 at roughly the same clock –$294, stick two of them in, and there you are.

Re:Compared to Intel? (1)

KingMotley (944240) | more than 2 years ago | (#38051722)

Well except the 130TDP of the 3690x is less than the 140TDP of the (almost equivalent) 6282 SE from AMD. Don't let facts get in the way of your beliefs.

how do they compare ? (0)

unity100 (970058) | more than 2 years ago | (#38050900)

they are much more capable in multithreaded performance. dozer cores were especially designed for these. like, how apparently 'sucking' for desktops, dozer surpasses all normal SBs (not e) in heavily multithreaded apps (4 cores and more) like photoshop cs5.

servers are heavily multithreaded. since dozer cores are especially suitable for tasks that are run on servers, more cores become even more desirable.

Re:how do they compare ? (2)

PIBM (588930) | more than 2 years ago | (#38051254)

1: You can buy your new sandy bridge from newegg or such right now, while those new bulldozers are nowhere to be found.
2: Overclocking any chip is bound to require a lot more power than the TDP no matter which one you are using.
3: Dozer's core, as you said, feel like they are dozing on the job..

yes (2)

unity100 (970058) | more than 2 years ago | (#38051484)

that must be why 3 supercomputers with dozer opterons have been ordered in the past 3 weeks.

Re:how do they compare ? (1)

KingMotley (944240) | more than 2 years ago | (#38051746)

No, they aren't.

Re:how do they compare ? (1)

beelsebob (529313) | more than 2 years ago | (#38051988)

Really? Because this looks like the FX-8150 getting beaten 3 ways silly by even an i5-2500 at photoshop:
http://images.anandtech.com/graphs/graph4955/41688.png [anandtech.com]

Re:how do they compare ? (1)

unity100 (970058) | more than 2 years ago | (#38052058)

really.

http://www.tomshardware.com/reviews/fx-8150-zambezi-bulldozer-990fx,3043-15.html [tomshardware.com]

radial blur, shape blur, median, polar coordinates.

This test employs threaded filters, taxing as many cores as we throw at it. Zambezi’s eight integer units capitalize, flying past the Core i5 and Core i7, outright trouncing the six-core Phenom II X6 1100T, too.

Re:how do they compare ? (0)

beelsebob (529313) | more than 2 years ago | (#38052118)

Notably though, everywhere else, even the i5 beats the shit out of it. Not because the other tests aren't multithreaded, but because they're not leveraging integer work exclusively. Or are you trying to suggest that the more threaded i7 beats the higher clocked i5 based on pure magic?

Bottom line – Bulldozer isn't good at multithreading, it's good at integer work. Unfortunately, servers are mostly logic work, so sandy bridge is likely to destroy it.

Re:how do they compare ? (1)

Rockoon (1252108) | more than 2 years ago | (#38052896)

Logic work *is* integer work, fool.

Re:how do they compare ? (1)

unity100 (970058) | more than 2 years ago | (#38052928)

Bottom line – Bulldozer isn't good at multithreading, it's good at integer work. Unfortunately, servers are mostly logic work, so sandy bridge is likely to destroy it.

oh boy. i just saw this. you dont know shit.

'servers are mostly logic work' hahahahaa. luckily someone else gave your answer.

next time, dont talk without knowing shit. 'servers' mean heavily multithreaded integer work. in these, bulldozer excels. and that is also one of the reasons why there have been 3 amd opteron (bulldozer 16 core) supercomputer orders in the past 3 weeks. NOT intel. amd. opteron, bulldozer. SUPERcomputer.

Re:how do they compare ? (5, Informative)

unity100 (970058) | more than 2 years ago | (#38052100)

and many, many, moooreeee

-mainconcept http://www.lostcircuits.com/mambo//i...&limitstart=17 [lostcircuits.com]
-mediashow http://www.guru3d.com/article/amd-fx...ssor-review/14 [guru3d.com]
-h.264 http://www.guru3d.com/article/amd-fx...ssor-review/14 [guru3d.com]
-vp8 http://www.guru3d.com/article/amd-fx...ssor-review/17 [guru3d.com]
-sha1 http://www.guru3d.com/article/amd-fx...ssor-review/17 [guru3d.com]
-photoshop cs5 http://www.lostcircuits.com/mambo//i...&limitstart=14 [lostcircuits.com]
-photoshop cs5 http://www.tomshardware.com/reviews/...x,3043-15.html [tomshardware.com]
-winrar, faster than 2600k http://www.techspot.com/review/452-a...pus/page7.html [techspot.com]
-winrar, improves over x6 http://www.tomshardware.com/reviews/...x,3043-16.html [tomshardware.com]
-7-zip better than 2600k here: http://images.anandtech.com/graphs/graph4955/41698.png [anandtech.com] http://www.anandtech.com/show/4955/t...x8150-tested/7 [anandtech.com]
-7-zip same perf as 2600k http://www.tomshardware.com/reviews/...x,3043-16.html [tomshardware.com]
-POV-ray, faster than 2600k http://www.legitreviews.com/article/1741/10/ [legitreviews.com]
-POV-ray http://www.nordichardware.se/test-la...art=15#content [nordichardware.se]
-x264(2nd pass AVX enabled) http://www.anandtech.com/show/4955/t...x8150-tested/7 [anandtech.com]
-x264 (2nd pass, better overall than 2600k) http://www.bjorn3d.com/read.php?cID=2125&pageID=11108 [bjorn3d.com]
-x264 (2nd pass +.3 than SB2600k) http://www.legitreviews.com/article/1741/7/ [legitreviews.com]
-handbrake; http://www.legitreviews.com/article/1741/9/ [legitreviews.com]
-truecrypt; http://www.bjorn3d.com/read.php?cID=2125&pageID=11111 [bjorn3d.com]
-solidworks; faster than 2600k http://www.techspot.com/review/452-a...pus/page7.html [techspot.com]
-abbyy filereader http://www.tomshardware.com/reviews/...x,3043-16.html [tomshardware.com]
-C-Ray, as fast as $1k i7-990X, http://i664.photobucket.com/albums/v.../c-rayir38.png [photobucket.com]

Re:how do they compare ? (1)

beelsebob (529313) | more than 2 years ago | (#38052144)

Good work digging up all the graphs where Bulldozer manages to get between the i5 and the i7 (which, based on its price point *it damn well should*, being priced half way between the two). Unfortunately, while you've dug up a nice bunch of places it just about holds its own, there many times more where the Sandy Bridge chip eats it for breakfast, including heavily multithreaded work. As I said above – Bulldozer is good at very multithreaded integer work, and pretty much nothing else.

Re:how do they compare ? (1)

unity100 (970058) | more than 2 years ago | (#38052890)

there many times more

yes. then instead if shooting from the hip, recount those times and occasions.

Re:Compared to Intel? (4, Interesting)

Surt (22457) | more than 2 years ago | (#38050932)

This would compete with the Xeon-E chips that aren't out yet. But in terms of performance about 75%, so this is the equivalent of a 12-core intel chip.

Re:Compared to Intel? (1)

ByOhTek (1181381) | more than 2 years ago | (#38051074)

If they aren't out yet, how can you know? I wouldn't trust the performance benchmarks from either manufacturer.

Re:Compared to Intel? (1)

Surt (22457) | more than 2 years ago | (#38051570)

This assumes that performance is not significantly different from the desktop line, which is usually the case.

Re:Compared to Intel? (2)

Surt (22457) | more than 2 years ago | (#38051590)

Slight correction, on threaded workloads, we'd be talking about a 6-core chip, intel runs 2 threads per core.

Re:Compared to Intel? (1)

rrossman2 (844318) | more than 2 years ago | (#38050938)

Not sure what Tyan has planned and what the chips can do, but tyan had boards that supported 4 quad core opterons plus you could add a "daughter board" that allowed you to add 4 more (plus more ram slots)

Now that setup using 16 core cpus in an eatx format would be crazy

Re:Compared to Intel? (1)

ByOhTek (1181381) | more than 2 years ago | (#38051064)

Yeah. I could ditch my furnace in the winter with a computer like that... Might even have to open a few Windows.

Re:Compared to Intel? (2)

Talderas (1212466) | more than 2 years ago | (#38051158)

The idle heat would be sufficient, no? I don't see why you would need to open some windows just to ramp up the temperature unless you're using this thing to few heat for a sauna.

Re:Compared to Intel? (4, Interesting)

beelsebob (529313) | more than 2 years ago | (#38051370)

Put simply, the AMD ones are slower than the intel ones by about 2 fold per core. This isn't because AMD sucked at design, so much as their marketing department sucked at telling the truth. In reality, we're looking at 8 core AMD CPUs with 2 integer units per core - i.e. no more 16 core than intel's are 16 core chips because of hyperthreading.

Once that's ironed out, the AMD chips turn out to have rather good performance if you want lots of integer work done, and the Intel chips to have rather good performance if you want anything else done.

fool. (1)

unity100 (970058) | more than 2 years ago | (#38051498)

they are like 3/4 cores. neither 1 core, nor half core.

Re:fool. (2)

beelsebob (529313) | more than 2 years ago | (#38051656)

The problem is, while this is true, bulldozer also suffers from being a fairly crappy arch design compared to sandy bridge. The result is that AMD's 8 "core" bulldozer is only roughly as fast as intel's 4 core i5 without hyperthreading. Extrapolate this to bolting two 8 "core" bulldozers together and you get to... well, that would only be about as fast as an 8 core sandy bridge with no hyperthreading, or a 6 core with hyperthreading. Given that Intel is selling 6 core E5 Xeons with hyperthreading for less than the $1000 AMD is asking for this, that really isn't boding well is it. This of course is then forgetting that this Bulldozer is very underclocked to keep power consumption down. This really doesn't look promising for AMD.

Re:Compared to Intel? (0)

Anonymous Coward | more than 2 years ago | (#38051500)

The real interesting thing is the new Intel chips are actually 8 core with two of them disabled because of TDP limits.
Do the new Opterons burn as much power as the new E series? I'd guess not but I haven't seen any TDP comparisons of the new series from either manufacturer.

really 16 core? (1)

neuro88 (674248) | more than 2 years ago | (#38050888)

Hmmm... According to the article, these new chips seemed to be based on the bulldozer architecture, so it might be better to think of these opterons as 8 core chips that have really good hyperthreading.

Re:really 16 core? (1)

ackthpt (218170) | more than 2 years ago | (#38050972)

Hmmm... According to the article, these new chips seemed to be based on the bulldozer architecture, so it might be better to think of these opterons as 8 core chips that have really good hyperthreading.

Hold your horse, cowpoke.

Just because it's based upon doesn't mean it will suffer the same issues as the Bulldozer. Perhaps this is the core which really works well, while the more consumer oriented Bulldozer is the red-headed stepchild.

Re:really 16 core? (2)

the linux geek (799780) | more than 2 years ago | (#38051294)

They both have the same issues, including that each module (two 4-issue cores) has a single 4-instruction decoder in front of it. Cache latency is also likely to be similar if not the same.

Re:really 16 core? (4, Interesting)

Zan Lynx (87672) | more than 2 years ago | (#38051168)

Maybe...

It'll be interesting. Most server applications are integer-only and never touch the floating point units. That should mean that Bulldozer designs work close to the full core count in contrast to the poor benchmarking results it puts out in Photoshop filters and video encode.

Re:really 16 core? (1)

Anonymous Coward | more than 2 years ago | (#38051274)

Exactly. The way I see Buldozer is that it is a good chip for things like web hosting, databases, middleware (ie. "the cloud"). Floating point performance is not that important if your threads do not do floating point. Heck, even if 1/2 of the threads do floating point, then you are fine.

Frankly, I only care how fast each thread can run and access memory. This is what is important in server consolidation. Floating point, meh.

Re:really 16 core? (1)

gweihir (88907) | more than 2 years ago | (#38051414)

Well, as Intel hyperthreading is basically brain-dead (had to disable it for decent performance as some things were glacially slow), really good hyperthreading just means usable hyperthreading for me. If Interl did not have so much money, AMD would have blown them away a long time ago. Intel technology sucks badly.

Bulldozer Cores are not that Great (4, Interesting)

TheTyrannyOfForcedRe (1186313) | more than 2 years ago | (#38050920)

The "cores" in Bulldozer are not your typical first-class x86 core. Bulldozer "cores" are worth 2/3 of a modern x86 core. The 6200 is more like a 10 core. Add to that the crappy IPC and I'm not impressed.

I was excited about Bulldozer before it was released. It's not often that CPU makers take chances on radical new architectures. Too bad this one turned out to be a huge pile of fail.

Re:Bulldozer Cores are not that Great (1)

synapse7 (1075571) | more than 2 years ago | (#38051330)

Hopefully they can be improved upon. I remember the first P4s had enough suck to be the target of a class-action suit.

Re:Bulldozer Cores are not that Great (0)

Anonymous Coward | more than 2 years ago | (#38051428)

true, but P4 was never "improved upon". It was orphaned. The new and fancy features of the P4 architecture (aside from SMT) began, and died, with the P4.

Re:Bulldozer Cores are not that Great (5, Informative)

Theovon (109752) | more than 2 years ago | (#38051386)

Your description in inaccurate, but that's not surprising since most slashdot readers don't know much about CPU architecture.

Bulldozers are essentially full-fledged cores, where the two cores in each module are mostly independent. There are two completely independent integer pipelines, so people seem to want to harp on the fact that the FPU is "shared". It's really a single split FPU, where each half can execute independent instructions, as long as the data width is 128 bits or less. Only when it is executing 256-bit AVX instructions is there any competition for resources. This is a very sensible design decision, since you don't find enough AVX software right now to justify completely dedicated AVX logic. (Plus, IIRC sandy bridge's FPU is only 128 bits wide and issues AVX instructions in two cycles, so what's the difference?) Moreover, even with AVX-heavy workloads, most software won't issue AVX instructions every cycle, and two AVX-heavy tasks on the same module won't really run into much contention. Assuming my memory of Sandy Bridge's FPU is correct, then Bulldozer has the advantage of having lower latency within the FPU on isolated AVX instructions.

The PROBLEM with Bulldozer is that they just have not done some of the really aggressive and costly things that Intel has done in their design. Bulldozer is still a 3-issue design. While going to 4-issue doesn't help that much that often, it still gives Sandy Bridge a slight edge. But where SB REALLY gets its advantage is the huge instruction window. Intel found clever ways to shrink the logic for various components so that they could make room for a much larger physical register file and reorder buffer. As a result, SB can have many more decoded instructions in flight, which exposes more instruction-level parallelism and, critically, absorbs more memory access latency.

A Sun engineer (discussing Rock, among other things) once described modern CPU execution as a race between last-level cache misses. When you have a miss on your L3 cache, it can cost hundreds of cycles, upwards of 1000. During that miss, the CPU fills up its reservation station with other instructions and then stalls, waiting on something to retire. This won't happen for a long time. Because of the disparity in speed (and latency) between compute and memory access, this is typically the most significant bottleneck. By enlarging the instruction window, SB can achieve much higher throughput, and it shows in the benchmarks.

This is Bulldozer's Achilles' heel. I know there are a few benchmarks where Bulldozer is faster than SB, but they're not typical workloads with typical memory footprints. Anyhow, so if you're going to rag on Bulldozer, rag on it for the right reasons. Bulldozer's "shared" FPU is a red herring.

Re:Bulldozer Cores are not that Great (0)

Anonymous Coward | more than 2 years ago | (#38051554)

Go bulldozer! [techreport.com]

Re:Bulldozer Cores are not that Great (5, Informative)

Artraze (600366) | more than 2 years ago | (#38051708)

The OP right, and seems to understand the issues far better than you. It isn't that the FPU is shared, it that nearly _everything_ is shared: Instruction cache, fetch and decode, FPU, L2 data cache. The only things that aren't shared are L1 data and integer operations (scheduler and ALU).

Instruction issuing and and cache misses are big performance areas, but these are precisely the resources the cores share! You're running two threads off (with the exception of L1 data) the same caches and instruction fetches. So, in reality, the second core in bulldozer is much more like ultra-hyperthreading than it is a second core. I think the fact that they're even listed as cores is a marketing strategy that has backfired pretty hard.

P.S. L3 cache has proven to be quite useless in many workloads... It helps a bit in servers, IIRC, but that's about it. So it's more a race to L2 cache, which, again, is a shared resource. AMD, in fact, has indicated that it may drop the L3 from desktop parts.

Re:Bulldozer Cores are not that Great (1)

Rockoon (1252108) | more than 2 years ago | (#38052228)

If you look at the performance numbers comparing Phenom II x4 830 (2.8ghz) to the new A8-3850 (2.9ghz) you see that the lack of L3 isnt a problem at all when you can also pack on twice as much L2.

Re:Bulldozer Cores are not that Great (0)

Anonymous Coward | more than 2 years ago | (#38052898)

- Bulldozer has a 64kB L1 instr cache (per mod), which is 2x SandyBridge
- Bulldozer has a 2MB L2 cache (per mod), which is 10x SandyBridge

Sure they are shared, but given the size increase that isn't necessarily a bad thing. The real question is the cache contention due to cacheline sharing, and access latency to said cache (competing requirements based on associativity). The real question is what is the effectiveness of the shared components within a given workload.

- Bulldozer has a 16kB L1 data cache (per core), which is 1/2 of SB.

Personally I'd be more worried about this fact despite that it's not shared. Again, real workloads and real data to indicate a components effectiveness is more important (i.e L1 cache miss rates + cyc cost).

Re:Bulldozer Cores are not that Great (1)

Tomato42 (2416694) | more than 2 years ago | (#38051876)

Then why Bulldozers are slower than Phenom II's in file compression (rar, zip, 7z, pick your poison) clock for clock? That's definitely not a shared FPU problem...

Re:Bulldozer Cores are not that Great (1)

mestlick (88537) | more than 2 years ago | (#38052022)

There are a few big mistakes about Bulldozer here.

The FP is completely shared between the integer clusters. The FP is 4-wide and the two clusters compete for all the resources in the FP.

Each Bulldozer integer cluster is 4-wide. The shared instruction fetch is also 4-wide.

Sandy Bridge has 168 instructions in flight and Bulldozer has 128 per cluster. Sandy Bridge has a combined FP/INT scheduler with 54 entries. Bulldozer has separate schedulers with 40 INT per cluster and 60 FP entries.

You are correct about BDs Achilles heal. The L2 and L3 latencies are longer than SB. I think the solution is to reduce the latencies, not increase the in flight window size.

Re:Bulldozer Cores are not that Great (0)

TheTyrannyOfForcedRe (1186313) | more than 2 years ago | (#38052060)

Your description in inaccurate, but that's not surprising since most slashdot readers don't know much about CPU architecture.

Gotta love Slashdotters who think they know you inside and out after reading one post. I graduated from one of the top Computer Engineering programs in the world and with very good grades. I ended up going into software after graduation but I did study computer architecture extensively in college, and designed and built a working CPU for my senior design project.

Re:Bulldozer Cores are not that Great (1)

WilliamBaughman (1312511) | more than 2 years ago | (#38052638)

Your description is also inaccurate. Instruction decode and L2 cache are shared between cores in Bulldozer modules as well; I wouldn't ding Bulldozer for the shared L2 cache but the L1 cache is write-through, and there doesn't seem to be enough cache bandwidth to keep both integer cores busy. Bulldozer is not a 3-issue design, it is a 4-issue design. With regards to Bulldozer's Achilles' heel, I think that its deficiency in single-threaded performance comes more from actual cache misses and latency than the smaller instruction window. I could be proven wrong by architectural studies that come out in the future. Either way, those studies will be interesting.

Re:Bulldozer Cores are not that Great (1)

loufoque (1400831) | more than 2 years ago | (#38052766)

(Plus, IIRC sandy bridge's FPU is only 128 bits wide and issues AVX instructions in two cycles, so what's the difference?)

My SSE code converted to AVX runs two times faster (not all of it though -- certain instructions do run in two cycles)

Re:Bulldozer Cores are not that Great (0)

Anonymous Coward | more than 2 years ago | (#38051436)

It depends on what you're working with. Most logic is integer based and with that a Bulldozer core scales really well.
http://www.phoronix.com/scan.php?page=article&item=amd_bulldozer_scaling&num=5

Re:Bulldozer Cores are not that Great (2)

ericloewe (2129490) | more than 2 years ago | (#38051440)

Bulldozer was very poorly handled from the beginning. What really suprises me is that they tried the NetBurst approach: when all else fails, go for clocks. Unfortunately, ARM seems to be focusing on a similar strategy (more cores, higher clocks, less focus on IPC)... Anyways, I don't buy their "poorly optimized" story. They knew all about it and could've waited - surely they realized at the early stages of development that OSes aren't optimikzed for this yet. They could've delayed Bulldozer and pushed out yet another incremental upgrade to the Phenoms - the die shrink alone would probably yield better results than those achieved by Bulldozer. Meanwhile Intel is able to get away with what is essentially 50% more performance in multi-threaded applications, 0% more in single-threaded ones (save minor influences from the memory subsystem and cache, which surprisingly have a HIGHER latency than SB). All this for around 100% more cash, plus added costs for "high-end" motherboards (still lacking native USB 3.0 from the chipset, along with only two native SATA 6.0Gb/s ports), quad-channel memory and a cooler.

Poor performance (1)

Anonymous Coward | more than 2 years ago | (#38050980)

I have a test machine with the 12-core version and the single-core performance is truly dreadful. Intel chips that are several year older perform way better in this regard. Even with a workload where the 16 cores can all be used to the fullest extent, I doubt the performance comes close to modern Intel chips.

Re:Poor performance (2)

nomel (244635) | more than 2 years ago | (#38051208)

This isn't the point. You get 16 cores (slowish compared to top of the line, they may be) that will fit in a single socket on a single motherboard, with a single power supply. This is a *huge* cost saving for machines that it makes sense to use them in...servers, where single core performance is relatively stupid to consider.

Re:Poor performance (3, Insightful)

the linux geek (799780) | more than 2 years ago | (#38051310)

Servers need single-thread too; think stuff like big database writes, joins, ERP, and CRM. Think outside the embarrassingly-parallel web-serving box.

If multithreaded performance was all that matters, the Sun Niagara chips would have done a lot better than they did.

Re:Poor performance (3, Insightful)

DarkOx (621550) | more than 2 years ago | (#38051510)

Umm, Joins can be done in parallel, in lots and lots of cases. ERP and CRP are applications that ought to see big improvements form more cores, if you have more than a few users anyway. It also simplifies things, you don't have to figure out how to architect the thing to run across 10 hosts anymore, good multi-core systems deliver there performance these apps need if you can get the disk IO solved. A good SAN with mutlipath support and multiple HBAs can get there.

Niagara failed because each individual core was too slow, a comparable cost Intel CPU could do in serial with one core two jobs, in less time than Niagara could do one job with on core. The question is here for most paralleled work loads like a database where all cores will be used are AMDs 16 core chips at least 62% the speed of Intel's 10 core chips on core vs. core basis? If true other things being equal for *some* work loads these Opterons will be better.

 

Re:Poor performance (1)

timster (32400) | more than 2 years ago | (#38051344)

The big question I have is if it will be like AMD's previous 12-core chips, where you could get 4 of them crammed into a 2U server for not all that much money. 4-Xeon configurations are way more expensive.

Wish List (3, Informative)

Nom du Keyboard (633989) | more than 2 years ago | (#38051020)

I so much want some real competition for Intel. Competition that doesn't artificially limit clock speeds and fuse off perfectly good working features in order to market a dozen overlapping and conflicting SKUs at a dozen different price points. And working drivers, current standards (DirectX 11 and OpenCL for starters), and USB-3 that doesn't require a $50 cable between every device would be nice.

Intel vs AMD's philosophy as of late (1)

Anonymous Coward | more than 2 years ago | (#38051076)

Intel: "Let's improve the memory controller's bandwidth, increase our IPC and also improve our platform by adding more PCIe lanes to the chipset that enthusiasts will find a use for"
AMD: "MOAR COARS!!!111!one"

Re:Intel vs AMD's philosophy as of late (4, Interesting)

level_headed_midwest (888889) | more than 2 years ago | (#38051658)

Eh, how about this:

Intel: I know, let's try to see just how many features/cores/cache we can fuse off in our dies and different socket combinations to try to make *puts pinky finger to mouth* one MILLION SKUs! Oh, and while we're at it, let's add a FOURTH memory channel, because more is better! Sure, we could get all the bandwidth we need with two DDR3-1866 or -2133 channels and that you really only get about three channels' worth of bandwidth because we have to clock the IMC down to DDR3-1333 with two modules per channel- but we still have FOUR channels! Oh, and we forgot, it's the start of a new quarter so we need to release a new socket. Can't let those socket suppliers get lazy making last quarter's socket design. What, you guys want us to release Sandy Bridge-based Xeon MPs because MP platforms actually need that much bandwidth and core count? We just released the Westmere-based ones a few months ago! Don'tcha know that Xeon MPs run two years behind everything else? Geez, what did you do, wake up yesterday? Next you'll want us to stop crippling our chips, stop using a new socket every other month or something ridiculous like that. Where do you guys get those ideas?

AMD: Based on market analysis, most server applications use primarily integer code and require a lot of bandwidth, memory capacity, and a high core count. We don't have over a hundred billion dollars in market cap to fund several parallel R&D teams to design a specific CPU for every edge use case, so we will design a CPU that is highly modular, has good integer performance (because that's what our research indicated most server apps are), and has a lot of cores. Experience with Intel's HyperThreading is less than stellar with regards to predictable performance, so we will use our CMT approach that leads to better integer performance than HyperThreading but doesn't increase the die size by a huge amount, since we can't afford to make 400-600 mm^2 dies like Intel does to have a lot of physical cores. Oh, and we'll continue to use the existing server platforms out there so our customers can drop-in upgrade and we'll also not change any feature sets in the SKU stack other than the clock speed and number of enabled modules and their associated caches. We do apologize for being "late" with these parts since we usually release server and client at about the same time...

Crippling chips (2)

Quila (201335) | more than 2 years ago | (#38052366)

It's common, live with it. Every Cell processor in a PS3 comes with eight cell processing units, with one disabled. That way they can set the standard for seven and use most of the chips that come off the line.

Even AMD had a problem with too-good yield about ten years ago, so they restricted the clock and sold "crippled" low-end chips that were technically rated to run at much higher speeds.

Re:Intel vs AMD's philosophy as of late (0)

Anonymous Coward | more than 2 years ago | (#38052794)

I remember in the K6/K6-2 days how AMD was harping on about FPU performance.

Some of us bought into the FPU thing, only to discover a P-II with faster integer performance was better, as most games used integer-based fixed-point math, and everything else later on was handed off to Glide.

Mind you, the K6 series chips were much cheaper than Intel-based solutions, so we did get good 'bang for our buck'.

It's kinda funny that they're on to the integer performance nowadays, whereas most games are floating point based all the way through. Oh well, at least the back-end people will be happy..

Re:Intel vs AMD's philosophy as of late (2)

Anarke_Incarnate (733529) | more than 2 years ago | (#38052430)

AMD already had the on die memory controller. Their answer to intel's Hyperthreading was real cores. The QPI bus that intel uses is very similar to the one AMD pioneered with Hypertransport. Let's not forget that AMD64 (oh, did you want me to call it EM64T or x86_64?) was a product of AMD's engineering effort rather than forcing people toward the EPIC architecture which seems to be niche based.

Can you imagine a beowulf cluster of these? (0)

Anonymous Coward | more than 2 years ago | (#38051136)

Sorry, had to try and bring it back...

Compared to Intel (0)

Anonymous Coward | more than 2 years ago | (#38051220)

We have a Sandy Box here at the office and we've benchmarked it and got some pretty impressive results. We have a NDA, but all I can say is that the performance is game-changing. So excited...

University Employee
Research Computing

When are multiple cores going to help me? (4, Interesting)

craftycoder (1851452) | more than 2 years ago | (#38051706)

I just got a fancy 8 core T7500 Dell workstation and only one of my compilers actually takes advantage of the multiple cores when it is compiling. As a result this expensive desktop is only 15% faster in terms of time to compile than the 4 year old PC it replaced (the new PC has twice the ram as the old though which may account for some of that speed increase). I am seriously unimpressed with all these cores. Maybe they are useful for something, but I've not found anything that I do that shows significant improvement. Putting my development projects on a SSD did much more for my work flow performance than this fancy new computer, that is for certain.

Re:When are multiple cores going to help me? (4, Informative)

Anonymous Coward | more than 2 years ago | (#38051864)

You're doing it wrong.

make -j8

Re:When are multiple cores going to help me? (2)

JohnnyBGod (1088549) | more than 2 years ago | (#38052862)

And if it's too much for one machine, use distcc.

Re:When are multiple cores going to help me? (2)

fyngyrz (762201) | more than 2 years ago | (#38051970)

Try doing DSLR image editing with Lightroom or Aperture. Those cores make one hell of a difference.

Re:When are multiple cores going to help me? (0)

Anonymous Coward | more than 2 years ago | (#38052350)

GNU Make takes the argument -j for the number of jobs to run. What language and what platform are you using?

Re:When are multiple cores going to help me? (0)

Anonymous Coward | more than 2 years ago | (#38052376)

You should try getting a better comiler. Most moder compilers will take full advantage of an 8 core system by using certain swithces. For example you can tell gcc to use 8 thread with the -j 8 switch.

Re:When are multiple cores going to help me? (1)

Anonymous Coward | more than 2 years ago | (#38052472)

Consider taking advantage of "make -j" if you use that tool.

Re:When are multiple cores going to help me? (0)

Anonymous Coward | more than 2 years ago | (#38052732)

Is that compiling a single source file, or multiple files? With GNU Make, at least, "make -j N" will run N separate processes for compiling separate files.

Re:When are multiple cores going to help me? (1)

onefriedrice (1171917) | more than 2 years ago | (#38052734)

I just got a fancy 8 core T7500 Dell workstation and only one of my compilers actually takes advantage of the multiple cores when it is compiling.

If your compiler isn't threaded, then at least run multiple compile jobs simultaneously--this is probably better anyway. If your build system can't do this, your tools are broken.

Build your own tablet? (0)

Eggbloke (1698408) | more than 2 years ago | (#38052930)

I have been thinking for a while that you could build your own tablet with one of these boards. Strap a touchscreen to one side and a battery to the other and install some tablet edition of Windows or Linux and it should work pretty well. Certainly more powerful than most tablets available today.
The only issue might be power consumption but it's quite a good trade off for performance and modularity. You could just use a bigger battery anyway.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?
or Connect with...

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>