Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Smarter Thread Scheduling Improves AMD Bulldozer Performance

Soulskill posted more than 2 years ago | from the almost-up-to-par dept.

AMD 196

crookedvulture writes "The initial reviews of the first Bulldozer-based FX processors have revealed the chips to be notably slower than their Intel counterparts. Part of the reason is the module-based nature of AMD's new architecture, which requires more intelligent thread scheduling to extract optimum performance. This article takes a closer look at how tweaking Windows 7's thread scheduling can improve Bulldozer's performance by 10-20%. As with Intel's Hyper-Threading tech, Bulldozer performs better when resource sharing is kept to a minimum and workloads are spread across multiple modules rather than the multiple cores within them."

Sorry! There are no comments related to the filter you selected.

So... (0)

Anonymous Coward | more than 2 years ago | (#37870842)

Bulldozer sucks at multitasking, but it's great if programmers utilize parallel programming techniques (which they don't use right now anyway--multicore processors are pretty much explicitly for improving multitasking performance due to this).

Re:So... (1)

laffer1 (701823) | more than 2 years ago | (#37871070)

You mean integer based instructions. Floating point is still not as good with the AMD chips (unless using the new instructions)

Re:So... (1)

EdZ (755139) | more than 2 years ago | (#37871242)

Worse, what this shows is that AMD's idea that you only need one FPU for every two integer units (how Bulldozer is laid out) results in a 20% performance drop.

Re:So... (1)

beelsebob (529313) | more than 2 years ago | (#37871906)

The idiocy here is that they've not succeeded in making bulldozer faster, they've succeeded in making one very specific benchmark run faster with very specific scheduler settings for that exact one benchmark. Give it some different code to run and this'll degrate performance.

no one got fired buying intel (1)

alen (225700) | more than 2 years ago | (#37870846)

that's the truth. unless i can buy an AMD server for a lot cheaper i'm not going to try and take on the risk of performance issues

Re:no one got fired buying intel (1)

h4rr4r (612664) | more than 2 years ago | (#37871022)

Depends on what you mean by a lot cheaper. If you need lots of cores but don't need them fast, like for a VM host then AMD servers can be quite a bit cheaper once we are talking about getting 128GB+ of RAM.

Risk of performance issues makes no sense if you don't know what app you want to run.

Re:no one got fired buying intel (3, Informative)

Antisyzygy (1495469) | more than 2 years ago | (#37871088)

AMD servers are way cheaper, and there are no performance issues most admins can't handle. What do you mean by performance? If you mean slower, then yes, but if you mean reliability than they are about the same. Why else do Universities almost exclusively use AMD processors in their clusters for cutting edge research? I can see your point if you are only buying 1-3 servers but you start saving shitloads of money when its a server farm.

Re:no one got fired buying intel (2)

KhazadDum (790345) | more than 2 years ago | (#37871368)

Agreed. To further expound upon parent's point, unless you really know your performance needs and requirements, where the initial extra cost of Intel chips is lower than the revenue that is gained with that extra couple percent of performance, then go Intel. Otherwise, it's usually a cost versus preference piss fest. And last I checked in a down economy, cost is king.

Re:no one got fired buying intel (4, Interesting)

Kjella (173770) | more than 2 years ago | (#37871960)

Well, it doesn't seem to apply when you get up to supercomputing levels at least. I checked the TOP500 list [top500.org] and it's 76% Intel, 13% AMD. As for Bulldozer, it has serious performance/watt issues even though the performance/price ratio isn't all that bad for a server. On the desktop, Intel hasn't even bothered to make a response except to quietly add a 2700K to their pricing table, with the 2600K left untouched. On the business side (where after all margins fund future R&D) then Sandy Bridge's 216mm2 is much smaller than Bulldozer's 315mm2. Intel can produce almost 50% more in the same die area, in practice the yields probably favor Intel more because the risk of critical defects go up with size. Honestly, I don't think Intel has felt less challenged since the AMD K5 days...

Re:no one got fired buying intel (1)

Antisyzygy (1495469) | more than 2 years ago | (#37872174)

When you can save 8000 per server then invest it in something else it becomes a different issue. I am not trying to say AMD processors are superior, I am just saying factoring in all costs including power and and the lifespan of the unit, AMD wins a lot of the time. Every computer cluster at every University I have ever had access to used AMD processors (with the exception of some NVidia units), and this was for their CS departments. I suspect part of the issue is its easier to justify power budgets and not as easy to justify 8000 more per server to upper admins. Figuring you could buy 1.5 AMD servers for the price of 1 Intel server you end up with a more cost effective computer as far as total CPU performance and RAM capacity goes. Power consumption is not one of AMD's strong suits, and I remember one of our server admins told me the power bill once for the main cluster, it was sickening. I vaguely remember it being in the hundreds of thousands per year. It saddens me that AMD is in this situation, but I seem to remember a time where Intel was pulling some pretty anti-competitive moves, though AMD should have capitalized on its successes in the past. I seem to remember, at least for the desktop environment, the Athlon XP's had better gaming performance. I suppose thats a small market, however even that was an opportunity that could have been exploited better.

Re:no one got fired buying intel (1)

Antisyzygy (1495469) | more than 2 years ago | (#37872196)

by cost-effective computer I meant cluster! Also, I would like to add I had high hopes for the bulldozer, so it was disappointing it was all marketing hype.

Re:no one got fired buying intel (1)

bill_mcgonigle (4333) | more than 2 years ago | (#37872358)

Fast memory bus, nothing special needed to use ECC RAM, good work/watt, and low prices all help win AMD for most clusters.

If you're aiming for a Top-500 slot and you have server money but not real estate money, then Intel is the logical choice.

Re:no one got fired buying intel (1)

Anonymous Coward | more than 2 years ago | (#37871180)

With the issues we have encountered in the past with Intel's microcode updates I really would not mind switching over to AMD... For most web- and database servers the cpu performance really does not matter much unless you have an abundance of ssl connections to it - and even then the difference between both manufacturers is hardly worth mentioning marginal. You just have to make sure everything is tuned to the underlying system - if you don't know how to do that you're in the wrong business.

Re:no one got fired buying intel (3, Informative)

QuantumRiff (120817) | more than 2 years ago | (#37871308)

A dell R815 with 2 twelve-core AMD processors (although they were not bulldozer ones) 256GB of ram, and a pair of hard drives was $8k cheaper than a similarly configured Dell R810 with 2 10-core Intel Processors when we ordered a few weeks ago. That difference in price is enough to buy a nice Fusion-IO drive, which will make much, much more of a performance impact than a small percentage higher CPU speed

Re:no one got fired buying intel (1)

0123456 (636235) | more than 2 years ago | (#37871434)

Clearly AMD should be charging $4k more for their CPUs if they're leaving that big a gap between their price and Intel's.

Re:no one got fired buying intel (1)

Surt (22457) | more than 2 years ago | (#37871592)

They're fighting reputation. If it was $4k more, they would probably lose too many sales to make up the price difference.

Re:no one got fired buying intel (1)

Zorpheus (857617) | more than 2 years ago | (#37872004)

Maybe their reputation would be better if their processors would cost the same.
Some people just think that something must be worse when it is cheaper.

Re:no one got fired buying intel (1)

billcopc (196330) | more than 2 years ago | (#37872156)

Those kinds of people are very vulnerable to an optimistic young techie destroying their rep as a purchaser, or so my last two years of sales would suggest. I displaced someone who would only buy "the best", which in his view meant something 5x more expensive, and where every tech dispatch was accompanied by a sales guy, to work the purchaser while techie was busy installing the goods.

If AMD can deliver better performance per $ and per watt in the server room, I'll consider them, and so will my clients if it improves their bottom line.

Re:no one got fired buying intel (0)

Anonymous Coward | more than 2 years ago | (#37871808)

They would but they have to deal with morons spouting this "no one ever got fired buying Intel" bullshit. If you aren't bright enough to evaluate your requirements and determine appropriate price/performance you can always go with the status quo and say "well everybody else is doing it so it can't be that bad a decision".

Re:no one got fired buying intel (1)

nabsltd (1313397) | more than 2 years ago | (#37871882)

A dell R815 with 2 twelve-core AMD processors (although they were not bulldozer ones) 256GB of ram, and a pair of hard drives was $8k cheaper than a similarly configured Dell R810 with 2 10-core Intel Processors when we ordered a few weeks ago.

The Westmere-EX CPUs on the Dell R810 are recently released, and as such are very pricey. They are also much, much faster than any other Intel or AMD chip on a per-clock basis. Because the E7-88xx Xeons have nearly twice the cache (30MB "smart" vs. 24MB total L2 plus L3), are hyper-threaded, and run faster clock-for-clock, a heavily parallel task will likely finish faster on a single CPU Westmere-EX than on a dual CPU Magny-Cours.

Because of this, the R810 is a much, much more powerful system than the R815, so it only makes sense that it's more expensive, although part of it is paying for the bleeding edge of Intel. In the more normal realm, you can get a pair of 2.4GHz 6-core E5645s for less than the price of a single 2.2GHz Opteron 6174. That's 12 cores and 24 threads vs. 12 cores, and overall more performance.

Re:no one got fired buying intel (1)

yuhong (1378501) | more than 2 years ago | (#37872410)

Yes, but I have wondered for a while what will happen to the quad-socket market if AMD sticks to the same pricing policy with Interlagos. Remember that Intel is one generation behind with Westmere-EX, and Sandy Bridge-EP is not even released yet right now.

Re:no one got fired buying intel (1)

afidel (530433) | more than 2 years ago | (#37872574)

Apples to apples they cost difference between an R810 and R815 should be on the order of $200, not $8,000.

So basically... (1, Insightful)

Anonymous Coward | more than 2 years ago | (#37870854)

So basically they suck. I shouldn't need to tweak my os thread scheduler just so a cpu can suck less. AMD needs to fix their shit instead of lame excuses.

Re:So basically... (1)

ackthpt (218170) | more than 2 years ago | (#37870974)

So basically they suck. I shouldn't need to tweak my os thread scheduler just so a cpu can suck less. AMD needs to fix their shit instead of lame excuses.

It's good for a low end multi core, but after a lot of research I've decided to go with the proven Phenom II processor.

Re:So basically... (2)

h4rr4r (612664) | more than 2 years ago | (#37871106)

So then SSDs suck because you have to tweak the IO scheduler(elevator)?

Re:So basically... (1)

Anonymous Coward | more than 2 years ago | (#37871224)

Yes. I as a user should not have to make esoteric workarounds for the lousy performance of your product. Especially when even with the tweaks it is only marginally less crappy but still sucks more than the competition or even your own competing product line that is cheaper. The Phenom II x6s can blow away the fx-8150 at half the price point

Re:So basically... (1)

h4rr4r (612664) | more than 2 years ago | (#37871456)

Tuning is a normal part of setting up a machine. If you don't want to do any tuning Dell will be happy to do it for you.

The Phenom 2 is probably what you should then buy.

Re:So basically... (1)

HarrySquatter (1698416) | more than 2 years ago | (#37871588)

Tuning the thread scheduler is not normal for 99% of users. This is a lame excuse by amd for a cpu core that will be megafail. Ivy bridge will make it look even more pathetic.

Re:So basically... (1)

h4rr4r (612664) | more than 2 years ago | (#37871668)

Users don't buy CPUs, the system builder will do this for you.

This is a pretty bad release out of AMD, lets hope they survive it.

Re:So basically... (0)

Anonymous Coward | more than 2 years ago | (#37871466)

So I should make a shittier universal product on the assumption that your shitty software will never get fixed?

Re:So basically... (3, Insightful)

turgid (580780) | more than 2 years ago | (#37872086)

Unfortunately, the Wintel world has thrived on this philosophy for 20 years.

Re:So basically... (3, Funny)

Runaway1956 (1322357) | more than 2 years ago | (#37871738)

"User". That summarizes half of the nonsense being posted here. This is a techie forum, isn't it? Techies tweak when no tweaking is needed. If you're a "user", then you're not even authorized to be in a server room. GTFO a STAY OUT!

(listens for door slamming as the dweeb runs out)

I just hate it when children blurt out their juvenile bullshit, interrupting the adults. Happens all the time . . .

Re:So basically... (2)

DeadCatX2 (950953) | more than 2 years ago | (#37871770)

Uh...what? Users don't have to do anything to the scheduler. That's the responsibility of the operating system. A Service Pack will be released and you won't have to do shit, so your argument is moot.

Besides, if your argument is "We shouldn't have to optimize schedulers", then you're a little late, because schedulers are most definitely optimized for their associated hardware

Re:So basically... (2)

fuzzyfuzzyfungus (1223518) | more than 2 years ago | (#37871514)

So then SSDs suck because you have to tweak the IO scheduler(elevator)?

How can you even Dream of trusting any drive that isn't good enough for solid, proven, CHS addressing?

Re:So basically... (1)

billcopc (196330) | more than 2 years ago | (#37872314)

There is a difference between a CPU upgrade and an SSD, which is not a hard drive at all and thus exhibits completely different performance characteristics. SSDs are a radical departure from the norm. A multi-core CPU is not.

I don't claim to know how CPU design works, but surely they must have ways to study or simulate real-world performance before the product is finalized and placed on shipping pallets. Windows' scheduler "sucks" ? Funny, it works fine with all the other Intel and AMD systems, even chunky ones like my 12-core SMP rig... Maybe AMD should have tweaked the chip to better handle the existing scheduler, instead of revving up the spin department to compensate for the hardware's embarrassing failure.

At the low end, AMD is still king. They have been for a good while now, and I've always been happy to flog excellent power-sipping machines based on the Athlon X2/X3/X4. Maybe they should just settle for that market and quit making asses of themselves in the high-end segment. They haven't had a praise-worthy flagship ever since Intel's Conroe.

Re:So basically... (1)

X0563511 (793323) | more than 2 years ago | (#37871134)

Because "Yea! Fuck progress!" - is that what I'm hearing?

Re:So basically... (1)

HarrySquatter (1698416) | more than 2 years ago | (#37871236)

Slower performance and higher tdp equals progress?

Re:So basically... (0)

Anonymous Coward | more than 2 years ago | (#37872006)

In a nutshell, yes.

Think back to when Intel first released the P4 chips. Almost everything out of the box ran worse than the highest clocked PIII at the time (1.13 GHz, if memory serves correct) even though the P4 had significantly higher clock speed (1.6 GHz on the initial offering, if memory serves correct). What releasing these to the public did was able Intel to iron out some of the stepping problems, fabrication problems and give the compiler writers time to incorporate the newest architecture improvements (like SSE2).

Everything AMD is going through sounds exactly like the PIII to P4. After 6 months to 1 year, the process will be significantly more mature and the Bulldozer chips will be serious contenders to Intel offerings.

Re:So basically... (2)

0123456 (636235) | more than 2 years ago | (#37872088)

After 6 months to 1 year, the process will be significantly more mature and the Bulldozer chips will be serious contenders to Intel offerings.

AMD just have to survive six months to a year of selling poorly-performing CPUs that have twice as many transistors as the competition.

Re:So basically... (1)

billcopc (196330) | more than 2 years ago | (#37872526)

Funny, I don't see it that way at all.

I think AMD enjoyed runaway success because of the P4, which was a very vulnerable platform for countless reasons. Poor IPC, awful thermals, and absurdly high prices. This gave AMD a giant gaping opportunity to dominate with their not-so-shitty AMD64. Then they released the dual-core, another great hit. They enjoyed nearly 4 years without any serious competition from Intel, but the moment Core 2 landed, it trounced AMD64 across the board, and came at a very reasonable price to boot. Sure, Intel learned from their mistakes, but AMD learned nothing. They still didn't have any major pull with OEMs, and their marketing arm did fuckall. The only people who even knew of AMD were gamers and techies. If I tried to sell anyone else a bang-for-the-buck AMD system, they'd ask "wtf is that garbage, I want an Intel"... user ignorance, sure, but AMD did nothing to improve their branding.

They have been playing catch-up ever since. In a year, when Bulldozer's successor comes out, Intel will also have something new to show. If AMD wants to take the performance crown, I'm fine with that idea, but they need to knock those early reviews out of the park with stellar performance. If they can't accomplish that, then stop trying and just focus on the growing value segment, where they are already known and loved.

Re:So basically... (1)

Sloppy (14984) | more than 2 years ago | (#37871358)

So basically they suck. I shouldn't need to tweak my os thread scheduler just so a cpu can suck less.

You must think the i3 and i7 suck too, then, since they have hyperthreading in addition to their multiple cores, and definitely benefit schedulers being HT-aware. Actually, you probably think all multicore CPUs and SMP motherboards suck, since before those were widely available, the kernels in use at the time didn't know how to use more than one CPU.

AMD needs to fix their shit instead of lame excuses.

Can't argue with that; Bulldozer's performance isn't as much as everyone was hoping it would be.

I think what's really gone wrong with the design is that in addition to the nifty approach to integer parallelism (which I still think was a great idea and makes the chips better than they would be without it) they also decided to do the longer-pipeline thing. And it would have worked, if they shipped the new CPUs with an extra GigaHertz or two of clockspeed. But they didn't. Probably for the same reason Intel gave up on the same idea after the P4.

I really hope that mistake doesn't end up killing them. They have got to either get the clockspeed up, or else lower their prices/profits further.

Re:So basically... (1)

HarrySquatter (1698416) | more than 2 years ago | (#37871442)

But does the end user have to do esoteric tweaks themselves for an Intel processor with hyperthreading? Nope.

Re:So basically... (1)

h4rr4r (612664) | more than 2 years ago | (#37871498)

The system builder did when they first came out.

The user, buys his machines off the shelf at Bestbuy.

Re:So basically... (1)

Sable Drakon (831800) | more than 2 years ago | (#37871874)

Windows also comes with this HT awareness out of the box since Vista. AMD has quite simply screwed themselves with Bulldozer. Promising massive gains, enough to shame Intel. Yet the reality is an abysmal one where not only does Intel still have the performance edge, but where the previous product offers better performance for even less cost of the newer hardware. AMD has failed, plain and simple.

Re:So basically... (1)

h4rr4r (612664) | more than 2 years ago | (#37871920)

Those CPUs existed before Vista.

Re:So basically... (1)

Sable Drakon (831800) | more than 2 years ago | (#37872018)

I'm aware of that, but XP wasn't HT aware right out of the box. The Prescot P4s were released after XP's launch and it's first service pack, even with SPs 2 and 3 that awareness was never added in. Vista was the first consumer version of Windows to incorperate it.

Re:So basically... (1)

0123456 (636235) | more than 2 years ago | (#37872078)

The oriignal hyperthreading P4s were pretty much irrelevant because they were single core; the OS either scheduled one thread or two based on whether hyperhreading was enabled in the BIOS, and there was nothing more complex required than that.

Re:So basically... (1)

billcopc (196330) | more than 2 years ago | (#37872884)

You're right, and yet HT processors still offered repeatable performance gains in real-world usage, even under Windows XP. HT-aware scheduling improved the margin somewhat, and narrowed the worst-case losses, but by and large Prescott showed a measurable improvement from day one. HT takes existing code and finds idle "holes" to sneak in another thread's instructions, improving performance with existing software.

Bulldozer just adds a bunch more physical cores, each one of them running slower than before, and completely ignores the fact that the majority of all desktop software, even if multithreaded, still relies on a heavy "primary" thread to do the bulk of the work. They might offload some tasks to additional threads but usually as an afterthought, cheaply tacked on to an existing codebase that predates multicore processors. Games, web browsers, office suites, media players... This is what Joe Random uses on a daily basis, and thus should be the focus of new consumer products.

The average user does not spend all day encoding video, or running "make world" for kicks. Their PR crew is spinning this half-baked hardware design as a software failure ? Who are they targeting with this release ? Not the gamers. Not the server crowd. Not the value segment. Not system builders. Who's left ? This doesn't feel like an HPC part, not unless they cram another 8 cores on that die and deliver 4-way and 8-way boards before Q2 2012, but then they would have called it an Opteron.

Re:So basically... (1)

afidel (530433) | more than 2 years ago | (#37872700)

Actually for first generation HT if you cared about performance you turned it off in the BIOS, it wasn't until Nehalem that HT actually added to performance in the majority of situations and that was mostly from a combination of better HT aware schedulers and actually better chip design.

Re:So basically... (1)

Surt (22457) | more than 2 years ago | (#37871628)

Because intel has the leverage to get those tweaks into windows.

Re:So basically... (3, Informative)

washu_k (1628007) | more than 2 years ago | (#37871790)

No, It's because AMD is lying to the OS. The "8 core" BD is not really 8, core. It only has 4 cores with some duplicated integer resources. Basically a better version of hyper-threading, but not a proper 8 core design.

The problem is that the BD says to Windows "I have 8 cores" and thus Windows schedules assuming that is true. If BD said "I have 4 cores with 8 threads" then Windows would schedule it just like it does with Intel CPUs and performance would improve just like in the FA.

There shouldn't need to be any OS level tweaks because Windows already knows how to schedule for hyper-threading optimally. If BD reported it's true core count properly then no OS level changes would be needed.

Re:So basically... (3, Interesting)

Kjella (173770) | more than 2 years ago | (#37872148)

There shouldn't need to be any OS level tweaks because Windows already knows how to schedule for hyper-threading optimally. If BD reported it's true core count properly then no OS level changes would be needed.

Except that hyperthreading quite obviously has one fast thread and one slow thread filling the gaps. In AMDs solution both cores in a module are equal, but they share some resources. To use a car analogy the Intel solution is a one-lane road with pullouts where the hyperthread sneaks from one pullout to the other while there's no traffic while the AMD solution is a two-lane road with one lane chokepoints. Both sorta allow cars to travel simultaneously, but I don't think the optimization would be the same.

Re:So basically... (0)

Anonymous Coward | more than 2 years ago | (#37872386)

Except that hyperthreading quite obviously has one fast thread and one slow thread filling the gaps. In AMDs solution both cores in a module are equal, but they share some resources.

What makes you think Intel has one fast thread and one slow thread filling the gaps? As far as I know, both threads share the core's resources equally and the CPU doesn't favor one over the other.

It's possible for some resource allocations to be unequal, but not due to favoritism. For example, consider the case where one thread stalls on memory accesses a lot while the other is does lots of register-to-register ALU ops. HT assignment of execution slots is opportunistic as far as I know, so the reg-to-reg thread will take most of the execution slots just because the stalled thread is hardly ever ready to do anything.

Re:So basically... (3, Interesting)

Anonymous Coward | more than 2 years ago | (#37871728)

You did when it was initially launched. Windows 2000's scheduler does not cope well with hyperthreading /at all/ by default. You saw similar things when dual core CPUs were launched. Now hyper threading and multicore are standard and OSs are aware of these cases.

It's already been pointed out that windows 8's scheduler is bulldozer aware and performs much better than windows 7. I would not be surprised to see a patch from Microsoft that specifically addresses scheduler performance improvements for bulldozer CPUs. We've seen similar things in the past.

By the way I'm seeing this unsusual phrase "Esoteric Tweaking" showing up a lot out of nowhere. It smells of astroturf. Could intel be affraid?

Could it bet that bulldozer architecture, with its uneven fpu-integer core ratio, be the key to significant future scaling above and beyond what 1:1 can offer?

Re:So basically... (1)

DeadCatX2 (950953) | more than 2 years ago | (#37871886)

Will the end user have to do esoteric tweaks after the next Service Pack for Windows? Nope.

Re:So basically... (1)

Chris Burke (6130) | more than 2 years ago | (#37872216)

Will the end user have to do esoteric tweaks after the next Service Pack for Windows? Nope.

Maybe they're saying that running Windows Update is an esoteric tweak?

I guess they should pay the teenager next door to do it for them, and then clear off all the spyware they have from running an unpatched OS.

Re:So basically... (0)

Anonymous Coward | more than 2 years ago | (#37872710)

They used to.

When they came out hyperthreading would degrade performance on some workloads. The OS saw them as 2 different cores, and you could get hyperthreads bouncing back and forth on the same core destroying the cache and leading to increased pipe-stalls.

Windows 2000 had issues, the end-user tweak was XP/2003.

Re:So basically... (1)

dpilot (134227) | more than 2 years ago | (#37871468)

No, what it means is that the software hasn't caught up to the hardware, yet. Until compilers and kernels/schedulers have time to react to Booledozer, we won't see what it's truly capable of. Since you're not interested in tracking such stuff, buy something more mainstream.

The interesting thing here is the lame excuses. Not that long ago, Intel managed to (nearly) simultaneously introduce both NetBurst and Itanium. AMD never would have survived such a debacle - there's serious question about whether they'll survive Booledozer, which hasn't yet gotten its chance with a proper compiler and scheduler. Yet Intel not only survived that disastrous dual introduction, they used their power an money to deny AMD's K8 the degree of business success it deserved to match its technical success.

Re:So basically... (0)

Anonymous Coward | more than 2 years ago | (#37871768)

Nice portmanteau, Booledozer - the two-state 1-bit processor.

Re:So basically... (3, Interesting)

fuzzyfuzzyfungus (1223518) | more than 2 years ago | (#37871478)

So basically they suck. I shouldn't need to tweak my os thread scheduler just so a cpu can suck less. AMD needs to fix their shit instead of lame excuses.

I've got some very bad news for you: While I have no particular knowledge of, or interest in, today's architecture pissing match, the days when the OS was allowed to ignore architectural details and expect things to just work optimally are good and over(if they ever existed in the first place).

Dynamic processor clocks? Why should I have to deal with some performance governor shit when Intel can just make a CPU that either uses almost no power at 3GHz or runs like a bat out of hell at 800MHz? Oh, because they actually can't. Sorry. Multiple cores? WTF? Why do they expect me to program in parallel for 2 3GHz cores instead of just giving me a 6GHz core? Oh, because they actually can't. Sorry. NUMA? Memory access times already blow! Now you want to make them unpredictable? Well, we can either repeal the speed of light and restrict every system to a single memory controller or deal with nonuniform access times and cry into our 128GB of RAM... The list just goes on. Hyperthreading can provide anything from less than zero improvement, if it increases contention for resources that were already being fully used, to fairly substantial improvement, if the CPU was being starved at times under a single thread. Now the Bulldozer cores have implemented something between full multi-core(with 100% duplication of resources per core) and hyperthreading(with virtually zero additional resources for the HT 'core'). Shockingly, performance depends on whether the two semi-independent cores are stepping on one another's shared toes or not...

Even if, in this specific instance, AMD happens to have fucked up and made the wrong architectural choice, that doesn't change the fact that you can't escape architectural oddities unless you are willing to stay quite far from the forefront of performance, or deal with some sort of hardware/firmware abstraction layer that ends up being at least as complex as the OS-level hackery would have been, but more likely to be vendor specific and have its cost spread across far fewer units. It certainly isn't the case that all architectural deviations are good, some are ghastly hacks best forgotten, some are perfectly OK ideas dragged down by products that overall aren't much good; but the path of progress has been liberally sprinkled with oddities that have to be accounted for somewhere in the overall stack.

Re:So basically... (1)

DarkOx (621550) | more than 2 years ago | (#37871554)

Yea, its not like its the operating systems job to abstract the hardware, and coordinate resource sharing.

Re:So basically... (1)

beelsebob (529313) | more than 2 years ago | (#37871948)

It's not only that – they tweaked the scheduler to make one very specific benchmark perform well. Now run a different benchmark, I bet this will degrade performance.

Not only that, but I bet we could play the same trick on $intel_chip with enough fiddling with settings.

Re:So basically... (0)

Anonymous Coward | more than 2 years ago | (#37872020)

I maybe completely off base, but the impression I've got from this and previous articles is that Microsoft has -ALREADY- tweaked the Windows thread scheduler for Intel's Hyper-Threading tech, and this is now only a matter of detecting Bulldozer and doing similar things for it. And I have to wonder if most of the performance gains will be made by essentially doing the -same things- (such as not putting two high loads on the same core when other cores are idle).

Re:So basically... (1)

0123456 (636235) | more than 2 years ago | (#37872060)

And I have to wonder if most of the performance gains will be made by essentially doing the -same things- (such as not putting two high loads on the same core when other cores are idle).

From the article it would appear that in other cases you'll reduce performance because that will disable 'turbo' overclocking. But the whole thing just seems too complex to optimise for because of all the special cases (e.g. don't put two integer threads on different cores, don't put two floating point threads on the same core), so that may be the best compromise.

Weird (1)

0123456 (636235) | more than 2 years ago | (#37870872)

Perhaps I'm remembering incorrectly, but I thought part of the Bulldozer hype was that it had two 'real' cores and not hyperthreading, with only a few resources shared? Yet now it turns out that you have to treat it like a hyperthreading CPU or performance sucks.

I still don't understand why AMD didn't just set the hyperthreading bit in the CPU flags, so Windows would presumably just treat it like a hyperthreading CPU in the first place.

Re:Weird (0)

Anonymous Coward | more than 2 years ago | (#37871026)

Lying to the OS for short term gain means long term pain.

Re:Weird (1)

0123456 (636235) | more than 2 years ago | (#37871850)

Lying to the OS for short term gain means long term pain.

Shipping hardware whose performance sucks on real workloads and expecting the OS developers to fix your problem causes short-term pain that leads to long-term pain as your sales drop through the floor.

Re:Weird (2)

Sloppy (14984) | more than 2 years ago | (#37871052)

I thought part of the Bulldozer hype was that it had two 'real' cores and not hyperthreading,

No, the hype is that it blurs the distinction between cores and hyperthreading. It's both and neither.

Re:Weird (0)

Anonymous Coward | more than 2 years ago | (#37871204)

Yes. I as a user should not have to make esoteric workarounds for the lousy performance of your product. Especially when even with the tweaks it is only marginally less crappy but still sucks more than the competition or even your own competing product line that is cheaper. The Phenom II x6s can blow away the fx-8150 at half the price point.

Re:Weird (2)

laffer1 (701823) | more than 2 years ago | (#37871064)

It's not like hyper threading. For integer operations, the AMD chips are much better. What AMD doesn't have is two floating point units so that's what gets bogged down. There are two instruction decoders and two units to handle integer math, but one floating point unit per component.

AMD's approach is faster for some workloads. The problem is that they didn't design it around how most people currently write software.

I would have preferred AMD to implement hyper threading as it would have greatly simplified things for OS developers. It's getting to a point where kernels have to know about CPU families in order to get the performance they need. They also have to know the workload.

For instance, if i'm trying to save power in a laptop, it's best with the new AMD chips to give all the instructions to the first two logical cpus which are the same cores. Then the others can go into an enhanced sleep state. However, this is slower than distributing to different physical cores. I'm even having trouble with terminology with these chips.

With intel chips, it's best to keep the same processes on nearby cores to take advantage of cache (for those that are really 2 cpus on the same package) but to avoid scheduling them on two threads on the same core. Again the power issue comes into play with intel chips as other cores could go into C1E state or similar.

AMD did add special instructions to the bulldozer chips that speed up floating point, but compilers and applications have to take advantage of them. Microsoft's Visual Studio does not yet.

Re:Weird (2)

0123456 (636235) | more than 2 years ago | (#37871380)

It's not like hyper threading. For integer operations, the AMD chips are much better. What AMD doesn't have is two floating point units so that's what gets bogged down. There are two instruction decoders and two units to handle integer math, but one floating point unit per component.

Ah, so this benchmark is floating point and that's why it's faster across multiple cores?

I can't really see AMD convincing Microsoft to invest a lot of effort into dynamically tracking which threads use floating point and which don't and reassigning them appropriately. Maybe a flag on the thread to say whether it's using floating point or not at creation time would be viable, but then app developers won't bother to set it.

Re:Weird (1)

fuzzyfuzzyfungus (1223518) | more than 2 years ago | (#37871766)

To me, Bulldozer's shared-FPU design looks rather like they wanted some of the specialized-workload advantage of the UltraSPARC T-series CPUs; but with somewhat less extreme trade-offs(The T1 had a single FPU shared between 8 physical cores, which proved to be a little too extreme and was beefed up in the T2). There are a fair number of server tasks that are FPU light; but have lots of threads, often do well with a lot of RAM, and are fairly cost sensitive.

Not at all a good recipe for a workstation or scientific computing device(which shows in that some of the present Phenoms stack up uncomfortably well with the newer architecture); but there are a lot of server loads that can use as many cheap threads as you can throw at them; but don't really hit the FPU all that hard...

Re:Weird (2)

DamonHD (794830) | more than 2 years ago | (#37871978)

A T1 is still working well for me: at most about 1 thread on my entire Web server system is doing any FP at all, and in places I switched to some light-weight integer fixed-point calcs instead. That now serves me well with the came code running on a soft-float (ie no FP h/w) ARMv5.

So, for applications where integer performance and threading is far more important than FP, maybe AMD (and Sun) made the right decision...

Rgds

Damon

Re:Weird (1)

washu_k (1628007) | more than 2 years ago | (#37872024)

It's not like hyper threading. For integer operations, the AMD chips are much better. What AMD doesn't have is two floating point units so that's what gets bogged down. There are two instruction decoders and two units to handle integer math, but one floating point unit per component.

It's a lot closer to hyper-threading than you think. The BD chips do *NOT* have two instruction decoders per module, just one. The only duplicated parts are the integer execution units and the L1 Data caches. The Instruction fetch+decode, L1 Instruction Cache, Branch prediction, FPU, L2 Cache and Bus interface are all shared.

It's pretty obvious how limited each BD "core" really is given these benchmarks. AMD should have presented the CPU as having hyper-threading to the OS.

Re:Weird (1)

Chris Burke (6130) | more than 2 years ago | (#37872160)

It's not like hyper threading. For integer operations, the AMD chips are much better. What AMD doesn't have is two floating point units so that's what gets bogged down. There are two instruction decoders and two units to handle integer math, but one floating point unit per component.

The decoders are a shared resource in the Bulldozer core. That can be a significant bottleneck that affects integer code. Also, those integer sub-cores are still sharing a single interface to the L2 and higher up the memory hierarchy. So it's not all roses for integer apps.

Speaking of memory hierarchy, the FX parts are, like FX parts of the past, just server chips slapped into a consumer package. So the cores being studied here all have pretty substantial L3s. One of the claimed benefits of putting related threads on the same core is that they can share via the L2. Which is true, but partially mitigated by sharing on the L3.

I would expect mainstream consumer parts based on the BD core to lack an L3, and then it's more likely that scheduling integer threads from the same process on the same core will provide a bigger benefit. The one test in the article that benefited from the 0xf affinity mask should show an even bigger increase, and other tests might change which affinity is preferred.

stave me (1)

epine (68316) | more than 2 years ago | (#37872192)

I would have preferred AMD to implement hyper threading as it would have greatly simplified things for OS developers. It's getting to a point where kernels have to know about CPU families in order to get the performance they need. They also have to know the workload.

This an architecture designed for a ten year run, much like the original P6, which underwhelmed everyone with (at most) half a brain.

Just how long do you think the OS can remain task agnostic as we head down the road to eight and sixteen core processors? Why plan for the future when we can languish on easy-street for another year or two? When the PC came out, some people complained they "would have preferred" a superior and more reliable electronic typewriter.

I'm quite certain the correct design approach is to resource a CPU regarding TDP as your performance wall. If eight floating point units require more TDP than your chip provides, what point is there in providing eight such units? And even if the math in the first spin from the new architecture could have gone the other way on some of these matters, in no time at all you're up hard against it, if you glance a few weeks further down the roadmap.

They also have to know the workload.

It's a bizarre conceit in any other walk of life that you can get away with not knowing the workload on the path to optimal resource assignment. Half of the human brain is devoted to power management. The glucose demand of the human brain is one of the big reasons why we were a late addition to mother nature's species road map. The brain doesn't operate from a baseline glucose guzzle equally able to handle any task that might come up. Much of what we perceive as quick reaction is only possible because the brain decided to fire up the necessary circuit 400ms beforehand.

"10% to 20%" boost is just overclocking processor (1)

Skarecrow77 (1714214) | more than 2 years ago | (#37870880)

The article basically says "if your schedule threads to use less modules, dynamic turbo will clock those modules up, giving you a performance boost.

so... anybody who is already clocking their entire cpu at top stable clock speed isn't going to get a boost out of thread scheduler modifications.

Re:"10% to 20%" boost is just overclocking process (1)

Skarecrow77 (1714214) | more than 2 years ago | (#37870924)

I take it back. apparently that's what page 1 says. There is a page 2 and it says something else entirely.

But does it actually make a difference? (2)

robot256 (1635039) | more than 2 years ago | (#37871058)

Sure, the scheduling change improves performance by 10-20% for certain tasks, but that still makes it 30-50% slower than an i7, and with more power consumption.

I can't fault AMD for not having full third-party support for their custom features, since Intel had a head-start with hyperthreading, but if it will still be an inferior product even after support is added then I'm not going to buy it.

Re:But does it actually make a difference? (1)

h4rr4r (612664) | more than 2 years ago | (#37871198)

30% slower at what percentage of the cost?
If it costs 50% as much as an i7 that might then be fine.

Re:But does it actually make a difference? (1)

AdamJS (2466928) | more than 2 years ago | (#37871262)

They generally cost between 8% less and 20% MORE than their closest performance equivalents (hard to use that word since the gap is still pretty noticeable). That's sort of part of the problem.

Re:But does it actually make a difference? (1)

HarrySquatter (1698416) | more than 2 years ago | (#37871382)

That and the fact that they are power hogs compared to even the higher end sandy bridge and phenom ii processors

Re:But does it actually make a difference? (2)

HarrySquatter (1698416) | more than 2 years ago | (#37871330)

An i2600k is only 15% more expensive has a 25% lower tdp and blows away the fx-8150 in most of the benchmarks. Even with this tweak it'll still barely compete and the 2600k has half as many real cores and a lower clock speed.

AMD Doesn't learn from intel (1)

Anonymous Coward | more than 2 years ago | (#37871212)

I would have been content if they had shrunk the X6 core down to 32nm, slap 2 of them on a chip and sell it as a 12 core. They could have released it a year ago.

Intel did just that with their first quad core, and the consumer wasn't concerned about philosophical discussions on its cores. Heck I'm typing this message on a kentsfield chip right now and even after all these years its a great processor.

Very Sad... (1)

poly_pusher (1004145) | more than 2 years ago | (#37871300)

Why does this sound like Barcelona? Granted, Bulldozer doesn't seem to have the same breadth of architectural flaws but still. God I miss the days when AMD came out with the X2 series... There is just no way AMD can compete with Sandy Bridge. With Ivy Bridge coming up, things are not looking good for AMD. After Barcelona they need to catch up a bit however, the performance difference seems to be increasing compared with Intel's offerings.

It's a Windows limitation (3, Informative)

Animats (122034) | more than 2 years ago | (#37871408)

This is really more of an OS-level problem. CPU scheduling on multiprocessors needs some awareness of the costs of an interprocessor context switch. In general, it's faster to restart a thread on the same processor it previously ran on, because the caches will have the data that thread needs. If the thread has lost control for a while, though, it doesn't matter. This is a standard topic in operating system courses. An informal discussion of how Windows 7 does it [blogspot.com] is useful.

Windows 7 generally prefers to run a thread on the same CPU it previously ran on. But if you have a lot of threads that are frequently blocking, you may get excessive inter-CPU switching.

On top of this, the Bulldozer CPU adjusts the CPU clock rate to control power consumption and heat dissipation. If some cores can be stopped, the others can go slightly faster. This improves performance for sequential programs, but complicates scheduling.

Manually setting processor affinity is a workaround, not a fix.

You've helped me find a SMALL "BUG" (0)

Anonymous Coward | more than 2 years ago | (#37871658)

Type start /? & see this part - the "help/manpage" for the start command!

(Mind you - I am on Windows 7 64 bit here):

SEPARATE Start 16-bit Windows program in separate memory space.
SHARED Start 16-bit Windows program in shared memory space.

* The "bug" being there ARE NO 16-bit WINDOWS SUBSYSTEMS IN 64-bit Windows..., only 32-bit subsystems...

APK

P.S.=> No "Huge Bug", but a misleading statement in the start commands' help outputs... apk

Re:You've helped me find a SMALL "BUG" (0)

Anonymous Coward | more than 2 years ago | (#37872146)

Shut the fuck up, jackass.

No problem... (1)

reztek (1935974) | more than 2 years ago | (#37871438)

http://hardware.slashdot.org/story/11/09/13/1336210/amd-breaks-overclocking-record-with-bulldozer [slashdot.org] AMD already showed how to speed things up on their Bulldozer line

Re:No problem... (1)

HarrySquatter (1698416) | more than 2 years ago | (#37871494)

Oh goody! Now the tdp can be even worse than it already is!

Re:No problem... (1)

h4rr4r (612664) | more than 2 years ago | (#37871594)

Why do you care about a few measly watts?

Is another 50watts really going to break your budget?
If that is the case you probably should not be buying a new computer.

Re:No problem... (1)

0123456 (636235) | more than 2 years ago | (#37871758)

Why do you care about a few measly watts?

Oddly, the AMD fanboys were making the opposite argument back in the days when you could cook your breakfast on your Pentium-4 while checking your email.

Re:No problem... (1)

h4rr4r (612664) | more than 2 years ago | (#37871950)

I own a Phenom 2 X4 and a Core 2 Quad. I buy what meets my needs at the price point I want when I want to buy it.

I am not a fanboy of either, I just want to see AMD survive so I don't have to pay far out the ass for CPUs. I owned one of those P4s at the time, I bought an Athlon that put it too shame not much later.

Re:No problem... (0)

Anonymous Coward | more than 2 years ago | (#37872766)

Might as well spend a few measly bucks and get an Intel based computer.

Windows? (1)

turgid (580780) | more than 2 years ago | (#37871576)

Windows is not exactly known for its multi-processor (multi-core) scalability.

Repeat the test with a real OS (Linux, Solaris...) and I'll be interested, especially Solaris x86 since it is known to be the best at scaling on parallel hardware.

It was already beating all intel in highly threade (5, Interesting)

unity100 (970058) | more than 2 years ago | (#37871612)

applications, like photosop cs5 or truecrypt, including some more :

http://www.overclock.net/amd-cpus/1141562-practical-bulldozer-apps.html [overclock.net]

also, if you set your cpuid to genuineintel in some of the benchmark programs, you will get suprising results :

try changing cpuid=genuineintel for +47% INCREASE IN SCORES.

changing cpuid to GenuineIntel nets 47.4% increase in performance:
[url]http://www.osnews.com/story/22683/Intel_Forced_to_Remove_quot_Cripple_AMD_quot_Function_from_Compiler_[/url]

PCMark/Futuremark rigged bentmark to favor intel:
[url]http://www.amdzone.com/phpbb3/viewtopic.php?f=52&t=135382#p139712[/url] [url]http://arstechnica.com/hardware/reviews/2008/07/atom-nano-review.ars/6[/url]

intel cheating at 3DMark vantage via driver: [url]http://techreport.com/articles.x/17732/2[/url]

relying on bentmarks to "measure performance" is a fool's errand. dont go there.

Re:It was already beating all intel in highly thre (2)

yuhong (1378501) | more than 2 years ago | (#37872452)

It is time for some reverse engineering of the benchmark programs I think to see what exactly is happening.

No need, everyone knows... (2, Informative)

Anonymous Coward | more than 2 years ago | (#37872558)

Here's Agner Fog's page about this issue. [agner.org]

The Intel compiler (for many years and many versions) has generated multiple code paths for different instruction sets. Using the lame excuse that they don't trust other vendors to implement the instruction set correctly, the generated executables detect the "GenuineIntel" CPU vendor string and deliberately cripple your program's performance by not running the fastest codepaths unless your CPU was made by Intel. So e.g. if you have an SSE4-capable AMD CPU, it will run the SSE2 codepath instead of the SSE4 codepath that comparable Intel chips will run.

Over the years, MANY libraries (including several from Intel) have been compiled and shipped with this compiler, with the result that the applications compiled with those libraries including many benchmarks, also suffer from the same performance sabotage.

Widows 7 hotfix (0)

Anonymous Coward | more than 2 years ago | (#37872330)

With Windows it's always wait until the next version of Windows to get new features. MS could easily hotfix Windows 7 with a Bulldozer aware scheduler. The won't, because they want something to drive sales of there "new and improved" OS. This is just one of the various reasons why I use Linux. In about a month Linux (kernel 3.2) will roll out a Bulldozer aware scheduler and I'm set.

Interestingly enough, if you look at the current Bulldozer benchmarks on Linux, it's performing quite nicely (even without the Bulldozer tuned scheduler).

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?