Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Not All Cores Are Created Equal

kdawson posted more than 5 years ago | from the working-out-the-kinks dept.

Intel 183

joabj writes "Virginia Tech researchers have found that the performance of programs running on multicore processors can vary from server to server, and even from core to core. Factors such as which core handles interrupts, or which cache holds the needed data can change from run to run. Such resources tend to be allocated arbitrarily now. As a result, program execution times can vary up to 10 percent. The good news is that the VT researchers are working on a library that will recognize inefficient behavior and rearrange things in a more timely fashion." Here is the paper, Asymmetric Interactions in Symmetric Multicore Systems: Analysis, Enhancements and Evaluation (PDF).

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered


The bad news is, Cho shot all the researchers. (-1, Offtopic)

Anonymous Coward | more than 5 years ago | (#26207747)


unsurprising. (5, Interesting)

Anonymous Coward | more than 5 years ago | (#26207751)

Anyone who thinks computers are predictably deterministic hasn't used a computer. There are so many bugs in hardware and software that cause it to behave differently than expected, documented, designed. Add to that inevitable manufacturing defects, no matter how microscopic, and it's unimaginable to find otherwise.

It's like discovering "no two toasters toast the same. Researches found some toasters browned toast up to 10% faster than others."

Re:unsurprising. (5, Funny)

Rod Beauvex (832040) | more than 5 years ago | (#26207757)

It's those turny knobs. They lie.

Re:unsurprising. (5, Funny)

symbolset (646467) | more than 5 years ago | (#26207795)

You have to buy the one that goes to 11. You know how 10 makes the toast almost totally black? Well, what if you want your toast just a little bit more crispy? What if you want just that little bit more? That's what 11 is for. Those other toasters only go to 10, but this one goes to 11.

Re:unsurprising. (4, Funny)

MightyYar (622222) | more than 5 years ago | (#26207811)

I had a Pentium that DEFINITELY went to 11.

Re:unsurprising. (5, Funny)

RuBLed (995686) | more than 5 years ago | (#26207937)

mine only went up to 10.998799799

Re:unsurprising. (2, Funny)

Anonymous Coward | more than 5 years ago | (#26209145)

Wow, a joke from 1995. It's true, Slashdot is at the forefront of cutting-edge humor.

Re:unsurprising. (1)

Fluffeh (1273756) | more than 5 years ago | (#26207943)

Was it one of those PII Celeron 300A's [wikipedia.org] that just ran and ran and ran even if you pushed them up from 300 mhz to 4509 mhz?

Those things were HAWT!

Re:unsurprising. (1)

Kent Recal (714863) | more than 5 years ago | (#26208243)

Hell yeah, that one was a bargain.
I had mine clocked at at 400MHz and iirc saved about $200 over an equivalent "real" PII.

Re:unsurprising. (0)

Anonymous Coward | more than 5 years ago | (#26209037)

525 was the top I could get mine air cooled and still be stable. Ran for two years before I clocked it down to 450 and sold it. About 2 years later, I bought it back, reclocked it to 525, and used it for a closet server.


Re:unsurprising. (1)

kimvette (919543) | more than 5 years ago | (#26209021)

I had an Abit motherboard (VP6) that went to 11. Unfortunately it ended with a little fireworks show. :( Stupid bad caps, lousy Abit QC.

Re:unsurprising. (1)

Anonymous Coward | more than 5 years ago | (#26207971)

You have to buy the one that goes to 11. You know how 10 makes the toast almost totally black? Well, what if you want your toast just a little bit more crispy?

It's like how much more black could this toast be? And the answer is none. None more black.

Re:unsurprising. (2, Insightful)

ElectricTurtle (1171201) | more than 5 years ago | (#26207841)

Mod parent to 5, seriously, it's so true. There are more than a few times after working support for decade when I've had to say, 'that should be impossible' but a symptom nonetheless exists.

Re:unsurprising. (5, Interesting)

aaron alderman (1136207) | more than 5 years ago | (#26208369)

Impossible like "xor eax, eax" returning a non-zero value and crashing windows? [msdn.com]

Re:unsurprising. (5, Funny)

$RANDOMLUSER (804576) | more than 5 years ago | (#26208421)

Moral of the story: There's a lot of overclocking out there, and it makes Windows look bad.

Oh. So that's what's been doing it.

Re:unsurprising. (1)

Anthony_Cargile (1336739) | more than 5 years ago | (#26208469)

Very interesting story, wish I had some mod points right now :). I think I found a new blog to subscribe to, only this one has a purpose!

Oh, and mod the comment above me up as well - that was just funny.

Re:unsurprising. (2, Interesting)

$RANDOMLUSER (804576) | more than 5 years ago | (#26207861)

I remember HP-UX on PA-RISC from at least ten years ago making efforts to reassign a swapped out process to the processor that it had been running on before it was swapped out, on the notion that some code and data might still be in the cache. SMP makes for some interesting OS problems.

Re:unsurprising. (2, Informative)

Majik Sheff (930627) | more than 5 years ago | (#26208543)

Processor affinity is still a nasty corner of OS design. It was one of the outstanding issues with the BeOS kernel that was not resolved before the company tanked.

Re:unsurprising. (1)

ClosedSource (238333) | more than 5 years ago | (#26208053)

Actually, the PC was designed to be non-deterministic. No software bugs, hardware bugs or manufacturing defects needed.

On the other hand, many early home computers were quite deterministic. In fact the Atari 2600 game machine was deterministic down to a single CPU cycle. Many 2600 games would not have worked if it were otherwise.

Re:unsurprising. (1)

TapeCutter (624760) | more than 5 years ago | (#26208151)

"It's like discovering "no two toasters toast the same. Researches found some toasters browned toast up to 10% faster than others."

What we need is a toaster with an IQ of around 4000.

Re:unsurprising. (0)

Anonymous Coward | more than 5 years ago | (#26208461)

What we need is a toaster with an IQ of around 4000.

Who the smeg would want that?

Re:unsurprising. (1)

aaron alderman (1136207) | more than 5 years ago | (#26208343)

I prefer to brown bread myself.
As a physicist I don't see why computers aren't deterministic. After all, you just start with a spherically symmetric computer...

Re:unsurprising. (5, Interesting)

zappepcs (820751) | more than 5 years ago | (#26208433)

Actually, (sorry no link) there was a researcher that was using FPGAs and AI code to create simple circuits, but the goals was to have the AI design it. What he found is that due to minor manufacturing defects, the code that was built by AI was dependent on the FPGA it was tested on and would not work on just any FPGA of that specification. After 600 iterations, you'd think it would be good. One experiment went for a long time, and in the end when he analyzed the AI generated code, there were 5 paths/circuits inside that did nothing. If he disabled any or all of the 5 the overall design failed. Somehow, the AI found that creating these do nothing loops/circuits caused a favorable behavior in other parts of the FPGA that made for overall success. Naturally that code would not work on any other FPGA of the specified type. It was an interesting read, sorry that I don't have a link.

Re:unsurprising. (1)

bm_luethke (253362) | more than 5 years ago | (#26209465)

You know, there have been a few cases of trying to work with some Open Source software that I find the following bit of logic in there:

If (1){
do stuff
more stuff

(well, other than any syntax errors - being dyslexic if I write two lines without them then I'm doing good)

And I never could figure out why the whole "if(1)". I always left it in the code because I figured someone somewhere had a reason and who am I to change it? I recall hearing Donald Becker rant about people taking "worthless" code out of his drivers and it being for some specific architecture. Though in this case I have always thought that someone was too lazy to change it initially (after all you had to find the other "}" and everyone else after them had the same idea I did.

Now I know for sure - some AI someplace added in some code that no one else understands and must stay in under their own little world. But then I guess that is something along the lines of Becker's complaint that it didn't hurt other hardware yet was required for some specific vendor.

I'm loath to change working code, even when it has something like the above.

who would've guessed... (4, Insightful)

Eto_Demerzel79 (1011949) | more than 5 years ago | (#26207755)

...programs not designed for multi-core systems don't use them efficiently.

Re:who would've guessed... (0)

Anonymous Coward | more than 5 years ago | (#26208041)

In Visual Studio just drag another core from the toolbox into the application and voila!

Re:who would've guessed... (4, Insightful)

timeOday (582209) | more than 5 years ago | (#26208439)

No, the programs are not the problem. The programmer should not have to worry about manually assigning processes to cores or switching a process from one core to another - in fact, there's no way the programmer could do that, since it would require knowing what the system load is, what other programs are running, and physical details (such as cache behavior) of processors not even invented yet. This is all the job of the OS.

Re:who would've guessed... (1)

PhrostyMcByte (589271) | more than 5 years ago | (#26208629)

The OS can only do so much. Most programs have downright horrible scaling on just 4 cores, let alone the 64 cores of 5 years from now. If you want to be scalable, you need to learn how to do it and design your app for it from the start.

Re:who would've guessed... (0)

Anonymous Coward | more than 5 years ago | (#26208799)

The problem isn't just the OS or the software being run -- it's the cache. What this article is about what everybody already knew: different workloads create different cache efficiency.... Not exactly the revelation of the century.

Re:who would've guessed... (1)

Splab (574204) | more than 5 years ago | (#26209417)

Actually it is the job of the programmer to make sure his program is cache friendly, that should work on all architectures.

Also you should in a multi-core/-CPU environment make sure data needed is close to where you are, that means fetching it from whatever storage it is in (ram, hdd, other core) as early as possible and non-blocking if possible so you can complete other tasks while waiting.

While the OS can help you with some tasks, there is no way for the OS to know what data you need next, so if you want high performance you have to program for it, and while you don't always have direct access to memory you can be pretty sure most hardware work in the same way, with the same drawbacks so usage of generalized optimizations for fetching/pushing data, for cache usage etc. should work over the boards.

Re:who would've guessed... (1, Insightful)

Anonymous Coward | more than 5 years ago | (#26208531)

Summary (I didn't RTFA) says that the performance of a program can vary depending on which core it is executing on. No mention of multi-threading or using multiple cores at once. The article is not about using programs using cores efficiently. it is the about unpredictability and differences between seemingly identical cores and how the OS can detect and correct those problems.

make -j 3 (0, Offtopic)

kevind23 (1296253) | more than 5 years ago | (#26207789)

Works fine for me.

Re:make -j 3 (1)

aliquis (678370) | more than 5 years ago | (#26207913)

And this is useful info because?

Isn't most of the point of using -j parameter that your machine can carry on compiling something else while whatever it did earlier get the resources it needed from disk or similar? Will it really help out with cache usage?

Should more processes mean better or worse cache performance? Worse because cache is shared between them, better because if something is missing some other instruction can be done while the needed data is fetched from RAM?

Re:make -j 3 (1)

kevind23 (1296253) | more than 5 years ago | (#26208221)

The point of using this is so that you can compile multiple files at once. Obviously it can't impact how the application performs because that would require modification of the source code, and the compiler doesn't magically optimize it to work with multiple cores.

Re:make -j 3 (1)

Anthony_Cargile (1336739) | more than 5 years ago | (#26208117)

I believe this allows make to make use of several cores, not the actual application being compiled. More specifically, -j means "jobs" and therefore not necessarily "cores" per se, but you could always manually tweak the affinity yourself if you're compiling something absolutely huge.

multicore dev is fun... much like prison rape! (4, Interesting)

Shadowruni (929010) | more than 5 years ago | (#26207809)

The current state of dev reminds me sort of the issues that Nintendo had with the N64.... a beautiful piece of hardware with (at the time) a God-like amount of raw power, but *REALLY* hard to code for. Hence the really interesting titles for it either came from Rare who developed on SGI machines (a R10000 drive that beast) or Nintendo, who built the thing.

/yeah yeah, I know the PS1 and Sega Saturn had optical media and that the media's storage capacity which lead to better and more complex were truly what killed the N64.

//bonus capt was arrestor

Re:multicore dev is fun... much like prison rape! (1)

aliquis (678370) | more than 5 years ago | (#26207919)

Could you point me at some direction for more information about the problems of developing for the N64? I knew developers didn't liked the Sega Saturn or whatever it was which had multiple cores but I don't remember reading anything about N64.

Re:multicore dev is fun... much like prison rape! (4, Interesting)

carlzum (832868) | more than 5 years ago | (#26208049)

I believe the biggest problem with multi-core development is a lack of maturity in the tools and libraries available. Taking advantage of multiple cores requires a lot of thread management code, which is great for highly optimized applications but deters run-of-the-mill business and user app developers. There was a recent opinion piece [ddj.com] in Dr Dobbs discussing the benefits a concurrency platforms I found interesting. The article is clearly promoting the author's company (Clik Arts), but I agree with his argument that the complexities of multi-core development need to be handled in a framework and not applications.

Yup (1)

coryking (104614) | more than 5 years ago | (#26208553)

The libraries and the languages currently make threading harder then it needs to be.

How about a "parallel foreach(Thing in Things)" ?

I realize there are locking issues and race conditions, but really I think the languages could go a some ways to making things like this more hidden. Oh wait, does that mean I'm advocating for making programming languages more user friendly? I guess so. You know why people use Ruby, C# or Java? Cause those are way more user friendly than C++ or COBOL.

The usability of a programming language matters a lot. Nobody uses threading because the current crop of programing languages makes it complex, confusing, and full of ways to shoot yourself in the foot. Make threading user friendly, and we might see more people create multi-threaded apps.

Re:Yup (3, Informative)

cetialphav (246516) | more than 5 years ago | (#26208741)

How about a "parallel foreach(Thing in Things)" ?

That is easy. If your application can be parallelized that easily, then it is considered embarrassingly parallel. OpenMP exists today and does just this. All you have to do (in C) is add a "#pragma" above the for loop and you have a parallel program. OpenMP is commonly available on all major platforms.

The real problem is that most desktop applications just don't lend themselves to this type of parallelism and so the threads have lots of data sharing. This data sharing causes the problem because the programmer must carefully use synchronization primitives to prevent race conditions. Since the programmer is using parallelism to boost performance, they only want to introduce synchronization when they absolutely have to. When in doubt, they leave it out. Since it is damn near impossible to test the code for race conditions, they have no indication when they have subtle errors. This is what makes concurrent programming so difficult. One researcher says that using threads makes programs "wildly nondeterministic".

It is hard to blame the programmers for being aggressive in seeking performance gains because Amdahl's Law [wikipedia.org] is a real killer. If you have 90% of the program parallelized, the theoretical maximum performance gain is 10X no matter how many cores you can throw at the problem.

But that is ugly (1)

coryking (104614) | more than 5 years ago | (#26209309)

And OpenMP isn't "standard" as far as I'm concerned. Plus it makes you think about threading and it only works in low-level languages like C.

I'm talking about this highly useful code (which is written in a bastardized version of C#, Perl and Javascript for your reading pleasure):

List pimpScores = PimpList.ThreadedMap(function(aPimp){
      # score how worthy this guy is at pimpin'
      if(aPimp.Hoes > 10) {
          return String.Format("Damn brother, {0} is a player", aPimp.PimpName);
      } else if (aPimp.Hoes 0) {
          return String.Format("{0} is a small time player", aPimp.PimpName);
      } else {
                    return String.Format("{0} isn't a player at all!", aPimp.PimpName);

Look how easy it was to turn a transform like Map into something threaded (even though C# doesn't have Map... I forget what LINQ method does the same transform)

OpenMP doesn't offer anything as intuitive as that. It makes you think long and hard about threading in a dull, dry manner. Threading is everywhere in our code if the program language makes it obvious and easy.

Re:Yup (1)

gfody (514448) | more than 5 years ago | (#26208985)

what you're asking for is pretty much already that easy

foreach(Thing in Things)
    new Thread(Thing.DoStuff);

Close (2, Interesting)

coryking (104614) | more than 5 years ago | (#26209077)

But you have to think about it too much.

How about:

  Console.Write("{0} is cool, but in parallel", thing);
  # serious business goes here

There are lots of stupid loop structures that are used in desktop apps that are just begging to be run in parallel, but the current crop of languages dont make it braindead easy to do so. Make it so every loop structure has a trivial and non ugly (OpenMP pragmas) way of doing it.

Also, IMHO, not enough languages do stuff like the Javascript Array.Each(function(element){}). Am I blind, or is this construct missing from C#?

Re:Close (1)

Marillion (33728) | more than 5 years ago | (#26209223)

I agree. I've ranted about this before. 99% of languages implement multi-threading through function calls. Class method calls, in this case, are merely glorified function calls. Multi-threading should be handled at the same level as other flow control statements because that's what is most like.

Re:Yup (0)

Anonymous Coward | more than 5 years ago | (#26209075)

If separate threads were automated to the point that braces have automated gotos / jumps (to the point where we don't even worry about how many function calls are made because braces even look fun) then it would be a breakthrough.

Imagine just coding something where you have like:

-> sharing (myLootDataStructure var)
-> splitToCores { work }
-> accumulate (whateverYouNeedIntoResult)

where you preferrably didn't need to define how stuff is shared, split or accumulated --just specify your stuff once, and let some magic do the "splitToCores" part dinamically. Same way as modern programming languages hide memory addresses by using variable names and scopes. I mean, if we can multicore program at all, then it's just a matter of some serious PhD work to define a new model and put a layer around stuff.

Only less ugly :-) (1)

coryking (104614) | more than 5 years ago | (#26209099)

And for those who say "what what about all the weird race conditions and stuff". I'm not a computer science major, so I'm jumping off an edge asking this, but what if we actually use some of this new CPU power in our IDEs and our JIT compilers, couldn't our languages watch out for most of the nasty ways we can shoot ourselves in the food? Like if I do a Array.ThreadedEach(function(element){}) and I'm changing some shared data, couldn't the compiler or IDE let me know at compile time or while I'm writing the code? Obviously you'd need a strongly typed language like C# or Java to pull such stunts, you couldn't do it in perl :-)...

The goal is to make this threaded stuff usable. I think we can do it.

Re:multicore dev is fun... much like prison rape! (1, Offtopic)

Fallingcow (213461) | more than 5 years ago | (#26208727)

The N64 was killed?

Best "party game" system of that generation, easily.

4 controller capability out of the box, 007 Goldeneye, Perfect Dark, Mario Kart, all the good wrestling games (hey, they were fun at the time...) etc.

The PS1 was only good for racing games and RPGs, IMO. Oh, and Bushido Blade 1 and 2.

Kind of like the Wii vs. 360/PS3. Any time we plug in a PS3 at a get-together, it's to ooh and ah over the graphics and maybe take turns playing the single player mode of a cool game (Need for Speed or something). When the Wii's plugged in it's so we can all play games together.

Then again, no one I know likes console shooters, especially ones that don't do split-screen (and if they do, they better dumb it down like Goldeneye/Perfect Dark so it's fun rather than frustrating with the damn broken console controller--we all like PC shooters), so that may be why we don't get any multiplayer action out of those other consoles.

/ I see you are a fark.com user, too // Slashies right back at ya!

Linux and Windows (3, Insightful)

WarJolt (990309) | more than 5 years ago | (#26207923)

I don't know if Linux or Windows has an automatic mechanism to schedule task priority based on processor caches, but the study didn't even mention Windows. Seeing that the scheduling and managing the caches are OS problems this seems kind of important.

The other thing that seems odd is they were using a 2.6.18 Kernel and in 2.6.23 they added the Completely Fair Scheduler which could potentially change their results. It doesn't seem logical to base a cutting edge study on stuff that was released years ago.

Re:Linux and Windows (1)

Anthony_Cargile (1336739) | more than 5 years ago | (#26208161)

I agree, and seeing this in the standard C/C++ libraries down the road would be nice. I would say Java would have framework-esque multicore support first, but then again Sun is in trouble and Java is just now getting video and 64-bit support. I don't use .NET enough to know, but it would be interesting to know if .NET has decent native multicore support and if Mono implements it correctly, although this all depends on MSIL versioning/limitations I'm sure.

In a nutshell, we need more portable multicore solutions in order to make better usage of them. Not just for the sake of being cross-platform, but for better documentation, example code, etc.

Re:Linux and Windows (1)

nategoose (1004564) | more than 5 years ago | (#26208271)

Last time I read anything about it (which was years ago) the Linux cache aware scheduling consisted of trying to get task scheduled on the same processor as they were scheduled on previously. This works well for a lot of things, but you lose a lot of benefit when multiple simultaneous tasks are working on the same data since those tasks would be spread across the processors to take advantage of concurrency.
This is just an engineering trade off.

Re:Linux and Windows (1)

nabsltd (1313397) | more than 5 years ago | (#26208277)

I don't know if Linux or Windows has an automatic mechanism to schedule task priority based on processor caches, but the study didn't even mention Windows. Seeing that the scheduling and managing the caches are OS problems this seems kind of important.

I'm not sure why this article isn't tagged "duh".

It's pretty obvious from looking at the CPU graphs of my VMware ESX servers that their code does some optimization to keep processes on the same core, or at the very least on the same CPU.

This data is from a dual-socket quad-core AMD (8 total cores), which means a NUMA [wikipedia.org] architecture, so running the code on the same CPU means you have faster memory access.

So, some commercial code that has been around for nearly 4 years takes advantage of the "discoveries" in an article published this month.

Re:Linux and Windows (3, Informative)

swb (14022) | more than 5 years ago | (#26208391)

They mentioned this in an ESX class I took. I seem to remember it in the context of setting a processor affinity or creating multi-CPU VMs and how either the hypervisor was smarter than you (eg, don't affinity) or that multi-CPU VMs could actually slow other VMs because the hypervisor would try to keep multi-CPU VMs on the same socket, thus deny execution priority to other VMs (eg, don't assign SMP VMs because you can unless you have the CPU workload).

Linux schedules better than this (3, Informative)

bluefoxlucid (723572) | more than 5 years ago | (#26207927)

Last I checked, Linux was smart enough to try to keep programs running on cores where cache contained the needed data.

Re:Linux schedules better than this (4, Interesting)

nullchar (446050) | more than 5 years ago | (#26207977)

Possibly... but it appears an SMP kernel treats each core as a separate physical processor.

Take an Intel Core2 Quad machine and start a process that takes 100% of one CPU. Then watch top/htop/gnome-system-monitor/etc where you can watch the process hop around all four cores. It makes sense that the process might hop between two cores -- the two that share L2 cache -- but all four cores doesn't make sense to me. Seems like the L2 cache is wasted when migrating between each core2 package.

Re:Linux schedules better than this (3, Interesting)

Krishnoid (984597) | more than 5 years ago | (#26208169)

Wasn't there an article recently about this describing that if only one core was working at peak capacity that the die would heat unevenly, causing problems?

Re:Linux schedules better than this (0)

Anonymous Coward | more than 5 years ago | (#26208195)

Why do people bother commenting on technical subjects of which they know nothing about?

Re:Linux schedules better than this (0)

Anonymous Coward | more than 5 years ago | (#26208229)

Isn't that why the new AMD cores all share the same cache, to avoid this 'unavailable' cached data? (I hope they get their act together. AMD can design but they can't execute. Intel can execute in numbing volumes but their designs leave a lot to be desired. And Motorola is probably still at 500MHz.

Re:Linux schedules better than this (1)

RAMMS+EIN (578166) | more than 5 years ago | (#26209323)

``And Motorola is probably still at 500MHz.''

Actually, they gave up on the desktop CPU market. They spun off their chip division into Freescale Semiconductor [freescale.com] , which now makes embedded processors.

Re:Linux schedules better than this (0)

Anonymous Coward | more than 5 years ago | (#26208743)

This is why a true quad core architecture beats two dual cores glued together. Of course, it does help to release that true quad core on time and at promised speeds....

Re:Linux schedules better than this (1)

Anthony_Cargile (1336739) | more than 5 years ago | (#26208353)

The article uses a kernel version that predates the completely fair scheduler, that would be why. If they aim to test something like this, they need to test the most recent version.

Re:Linux schedules better than this (1)

bluefoxlucid (723572) | more than 5 years ago | (#26208365)

Your PAUSE() function will spin indefinitely instead of continuing.

Re:Linux schedules better than this (1)

Anthony_Cargile (1336739) | more than 5 years ago | (#26208521)

Exactly. If Slashdot gave me more room, I would have put the rest of the joke on there:

void PAUSE(){ printf("\nPress any key to continue. . ."); while(1) getch(); } // Enforce the 'any' key

Whats even worse is that this line of code was used in a fake cmd.exe [anthonycargile.info] I made for a prank on my friend's computer. Tricky to install due to having to point the COMSPEC env. variable to a backed up version of the real cmd.exe and tinkering with the dllcache directory, but it was priceless to see his reaction to the fake ping error :D.

Re:Linux schedules better than this (1)

timeOday (582209) | more than 5 years ago | (#26208513)

Last I checked, Linux was smart enough to try to keep programs running on cores where cache contained the needed data.

As if simply giving each process affinity for a given core solves the problem. But then you have interrupt handling, job loads with more than one process per core, multi-threaded programs - all sharing memory space yet with different memory access patterns - and different processors with e.g. different cache architectures. The task-switching OS is 50 years old and we still haven't settled on THE perfect scheduler - and now you suggest solving that problem with several more degrees of freedom due to multi-core is solved by a trivial heuristic.

NUMA NUMA (3, Informative)

Gothmolly (148874) | more than 5 years ago | (#26207979)

Linux can already deal with scheduling tasks to processors where the necessary resources are "close". It may not be obvious to the likes of PC Magazine, but its trivially obvious that even multithreaded programs running on a non-location aware kernel are going to take a hit. This is a kernel problem, not an application library problem.

Re:NUMA NUMA (-1, Flamebait)

Anonymous Coward | more than 5 years ago | (#26208239)

sadly linux is for dick sucking faggots. it's not for real people.

This isn't news (5, Informative)

nettablepc (1437219) | more than 5 years ago | (#26207989)

Anyone who has been doing performance work should have known this. The tools to adjust things like core affinity and where interrupts are handled have been available in Linux and Windows for a long time. These effects were present in 1980s mainframes. DUH.

Re:This isn't news (5, Insightful)

Clover_Kicker (20761) | more than 5 years ago | (#26208095)

80s mainframe tech is NEW and EXCITING to a depressing number of tech people, look at how excited everyone got when someone remembered and re-implemented virtualization.

Re:This isn't news (1, Informative)

Anonymous Coward | more than 5 years ago | (#26208911)

80s mainframe tech is NEW and EXCITING to a depressing number of tech people, look at how excited everyone got when someone remembered and re-implemented virtualization.

Ummm, that's re-implemented virtualization on x86 with very little performance overhead and at a very reasonable cost. That was new and exciting.

And while I did use CICS and MVS back in the day, I don't think IBM had technology (maybe they did, but I never heard of it) like VMware's vMotion, where you can take a running virtual machine and move it from one host to another.

Processor affinity isn't new. Quite a few applications have settings for that, even Microsoft Sql Server 2000.

Re:This isn't news (0)

Anonymous Coward | more than 5 years ago | (#26208517)

Well, I didn't have a mainframe in the 80s.

it's the affinity (2, Informative)

non-e-moose (994576) | more than 5 years ago | (#26208063)

It's just an Insel Intide thing. DAAMIT processors are more predictable. Or not. If you don't use numactl (1) to force socket (and memory) affinity, you get exactly what you ask for (randomly selected sockets, and unpredictable performance)

not a surprise (5, Insightful)

Eil (82413) | more than 5 years ago | (#26208119)

Here's an exercise: Take 2 brand-new systems with identical configurations and start them at the same time doing some job that takes a few hours and utilizes most of the hardware to some significant degree. Say, compiling some huge piece of code like KDE or OpenOffice. System administrators who do exactly this will tell you that you'll almost never see the two machines complete the job at precisely the same time. Even though the CPU, memory, hard drive, motherboard, and everything else is the same, the system as a whole is so complex that minute differences in timing somewhere compound into larger ones. Sometimes you can even reboot them and repeat the experiment and the results will have reversed. It shouldn't come as a surprise that adding more complexity (in the form of processor cores) would enhance the effect.

Re:not a surprise (4, Interesting)

im_thatoneguy (819432) | more than 5 years ago | (#26208459)

We have this problem at work.

We have a render farm of 16 machines. 12 of them are effectively identical but despite all of our coaxing one of them always runs about 30% slower. It's maddening. But "What can you do?". Hardware is the same. We Ghost the systems so the boot data is exactly the same... and yet... slowness. It's just a handicapped system.

Re:not a surprise (1)

visualight (468005) | more than 5 years ago | (#26208857)

Move processors around so you get a different boot proc, if you haven't tried that already.

Re:not a surprise (1)

Ethanol-fueled (1125189) | more than 5 years ago | (#26209085)

Ahh, the trusty ol' cycle 'n' swap. It's funny how complex problems often have simple fixes. Kinda like how the car won't start unless you kick the fender before you turn the crank.

Some people put together servers all day that way: swapping a bunch of intermittent crap in and out until the box runs long enough to install the OS :)

Re:not a surprise (0)

Anonymous Coward | more than 5 years ago | (#26209063)

the machine that's slower, is it
..always the same machine?

that machine is damaged
..always a different machine?

you have a networking bottleneck


Re:not a surprise (1, Informative)

Anonymous Coward | more than 5 years ago | (#26209201)

There are a number of possibilities. Make sure the CPU family/model/stepping is the same between the slow and normal effectively identical machine. Check that the DIMMs are exactly the same and installed in the same slots as the other machines. You might even try plain swapping memory with a known good machine. Another thing to check is the PCI bus. If you have a card in one slot in one machine and in a different slot in another machine, it might make a difference as to how the BIOS allocates interrupts for other devices (which may affect how Linux's lame interrupt mapping sets priorities). If this render farm machine talks on the network, it could be its own ethernet adapter is having problems or the switch port to which it is connected. Check for errors logged on both sides (ifconfig eth0) -- also make sure the ports are running full duplex.

I find it hard to believe (0)

Anonymous Coward | more than 5 years ago | (#26208133)

that cutting edge research is done in Virginia.

Re:I find it hard to believe (0)

Anonymous Coward | more than 5 years ago | (#26208383)

Yes Santa, there is a Virginia!

In summary.... (0)

johnlcallaway (165670) | more than 5 years ago | (#26208249)

So and compiler do it for you, performance results are not consistent between runs.

Wow ... what a shock....

What's next. A study that shows if you don't select any optimization parameters a program won't run as effective as selecting the best ones??

Well known problem (3, Insightful)

sjames (1099) | more than 5 years ago | (#26208411)

The problem is a complex one. Every possible scheduling decision has pluses and minuses. For example, keeping a process on the same core for each timeslice maximizes cache hits, but can lose if it means the process has to wait TOO long for it's next slice. Likewise, if a process must wait for something, should it yield to another process or busy wait. SHould interrupts be balanced over CPUs or should one CPU handle them?

A lot of work has gone in to those questions in the Linux scheduler. For all of that, the scheduler only knows so much about a given app and if it takes TOO long to 'think' about it, it negates the benefits of a better decision.

For special cases where you're quite sure you know more than the scheduler about your app, you can use the isolcpus kernel parameter to reserve CPUS to run only the apps you explicitly assign to them.

You can also decide which CPU any given IRQ can be handled by (but not which core within a CPU as far as I know) wilt /proc/irq/*/smp_affinity.

Unless your system is dedicated to a single application and you understand it quite well, the most likely result of screwing with all of that is overall loss of performance.

What if... (1)

raftpeople (844215) | more than 5 years ago | (#26208565)

We added 4 more cores to perform this "thinking" about which core the process should run on, we should be able to get back that 10% we lost, right?

Interrupt redistribution (0)

Anonymous Coward | more than 5 years ago | (#26208597)

TFA doesn't seem to specify, but I assume they're referring to Linux. Recent versions of Solaris (and also HP-UX) already have some of this functionality in what they call an "interrupt redistribution daemon".

This isn't hardware (2, Informative)

multimediavt (965608) | more than 5 years ago | (#26209227)

Why is this article labeled as hardware? Sure they talk about different procs being ... well, different. Duh! The article is about the software Tom and others developed to run processes more efficiently in a multi-core (an possibly heterogenous) environment. Big energy savings as well as performance boost. Green computing. HELLO! Did you read page two?

Baisc SMP/NUMA (0)

Anonymous Coward | more than 5 years ago | (#26209287)

Can't see what the big news is, any single socket multi core system would look like and simple SMP and a multi socket would have some NUMA characteristics. So affinity scheduling, locality and behavior aware memory allocation and some interrupt fencing should create a deterministic behavior :)

Guess he should try OpenSolaris, been there, tried that and so forth :)

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account