Casting a Jaundiced Eye On AnTuTu Benchmark Claims Favoring Intel 82
MojoKid writes "Recently, industry analysts came forward with the dubious claim that Intel's Clover Trail+ low power processor for mobile devices had somehow seized a massive lead over ARM's products, though there were suspicious discrepancies in the popular AnTuTu benchmark that was utilized to showcase performance. It turns out that the situation is far shadier than initially thought. The version used in testing with the benchmark isn't just tilted to favor Intel — it seems to flat-out cheat to accomplish it. The new 3.3 version of AnTuTu was compiled using Intel's C++ Compiler, while GCC was used for the ARM variants. The Intel code was auto-vectorized, the ARM code wasn't — there are no NEON instructions in the ARM version of the application. Granted, GCC isn't currently very good at auto-vectorization, but NEON is now standard on every Cortex-A9 and Cortex-A15 SoC — and these are the parts people will be benchmarking. But compiler optimizations are just the beginning. Apparently the Intel code deliberately breaks the benchmark's function. At a certain point, it runs a loop that's meant to be performed 32x just once, then reports to the benchmark that the task completed successfully. Now, the optimization in question is part of ICC (the Intel C++ compiler), but was only added recently. It's not the kind of procedure you'd call by accident. AnTuTu has released an updated "new" version of the benchmark in which Intel performance drops back down 20-50%. Systems based on high-end ARM devices again win the benchmark overall, as they did previously."
Benchmarks, trustworthy? (Score:1)
Of course not.
Make them do real work loads. With Monkeys.
Re: (Score:1)
What became of the famous "gaussian blur benchmark"? What could be more universal? My personal favorite is hitting the "x^2" key on the calculator until it took more than two minutes to get a result.
Comment removed (Score:5, Informative)
Re: (Score:1)
Why would you use an Intel compiler on a non Intel cpu?
Re: (Score:1)
The real question is why would you use an Intel compiler. At all. Period.
Re: (Score:1)
Because for intel cpus it is actually really good?
Re: (Score:2)
Nope. If you remove the checks in the resulting compiled binary, the intel-optimized version of the code runs faster.
I'd call that shady.
Re: (Score:2)
The intel compiler puts code in the compiled binary that does the checking. It doesn't matter whether you compile on Intel, the resulting binary is crippled and runs slower than necessary on AMD.
I worked with a guy that wrote a program to patch said binaries to remove the checking - this resulted in a nice speedup on all our boxes, since it was an AMD shop.
GP is absolutely right.
Re:Benchmarks, trustworthy? (Score:5, Insightful)
To be fair, any use of a benchmark to judge which system to buy is pretty silly. The best benchmark you can make is something that is identical to your intended workload; eg play a game or use an application on several systems, and see which feels better to you.
Taking some code written in a high-level language and compiling it for a platform is a great benchmark - if that's what you're going to be doing with the system. But you'd better be using the compiler you'll be using on the system. If you need free, you should test GCC on both. If you are considering buying Intel's compiler (it's not free, is it?), then add it in as another test to see if it's worth the extra outlay of cash. Intel puts a lot of work into making compilers very good on its systems, so if you're going to use the Intel compilers for Intel systems, it's perfectly valid to compare against using GCC on an ARM platform, if that's what you'd be using on ARM.
But if most of what you're running will be compiled in GCC for either platform, yes, you should absolutely test GCC on both.
That said, much of what's noted isn't necessarily intentional wrongdoing. For the example of breaking functionality, it's quite possible that the compiler made a perfectly valid optimization to get rid of 31 of the 32 loop iterations. One of my professors once told a story about how he wrote a benchmark, and upon compiling it, found that he was getting some unbelievably fast results. As in literally unbelievable - upon investigation, he discovered that the main loop of the benchmark had been completely optimized away, because the loop was producing no externally visible results. (As an example, if the loop were to do "add r3 = r2, r1" 32 times, a good compiler could certainly optimize that down to a single iteration of the loop; as long as r2 and r1 are unchanging, then you only need to do it once. Similarly, even if r1 and r2 are changing on each iteration, you need to use the result in r3 from each iteration of the loop, otherwise you could optimize it to only perform the final iteration, and the compiler could pre-compute the values that would be in r2 and r1 for that final iteration.)
So perhaps it's a bad benchmark - but I wouldn't default to calling it malicious, just that the benchmark isn't measuring what you might want it to measure. And quite frankly, most users aren't going to be doing anything that even vaguely resembles a benchmark anyway, so they really have little justification to make a buying decision based on them.
Re: (Score:2)
And that's exactly what benchmarks are supposed to approximate. If they aren't doing that, it's because they are bad benchmarks.
People can't go and get hands-on with every system out there, and even if they couldn't, they can't just install all their own software on it and try it out for a few days... so we need some objective
Re: (Score:3)
Re: (Score:2)
Let me start this by saying I'm no fan of Intel - quite frankly, many of their business practices are a little suspect, and they've had some downright nasty ones before (like selling a bundle of CPU + Northbridge for less than the CPU alone, and then saying it violated the agreement if the OEM buyer decided to toss the Northbridge in the garbage in lieu of a different manufacturer's chipset.). But I don't see a slam-dunk case for antitrust in this alone.
The first reason is that there may actually be technic
Re: (Score:2)
Re: (Score:2)
Are you talking about the compiler that was checking the processor ID instead of the capabilities of the processor? That's an old story that has been fixed a long time ago.
In all fairness, compiler optimizations are close to black magic. The only reasonnable way to know what is best is to test multiple compilers and see what comes. Depending on codes some compiler will be better than some other one. Even on intel platforms, depending on benchmarks sometimes gcc performs much better, sometimes icc performs b
Comment removed (Score:4, Informative)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:3)
I've been all AMD almost forever, for this reason among others.
http://forums.pcper.com/showthread.php?470102-Intel-s-compiler-cripples-code-on-AMD-and-VIA-chips [pcper.com] 2010
http://www.theregister.co.uk/2009/12/16/intel_ftc/ [theregister.co.uk] 2009
http://techreport.com/news/8547/does-intel-compiler-cripple-amd-performance [techreport.com] 2005
I found those three on the first page of my search results, and quit looking. Different search terms and a more determined search will find hits as old as about 1999, maybe even older. Hard to remember, but
Re: (Score:2)
Re: (Score:2)
But still... (Score:3, Insightful)
Re: (Score:3, Insightful)
if you use icc instead of gcc for x86 then you should use the ARMCC compiler or Keil or one of the others for arm.
Yet it does not make this "benchmark" honest. (Score:5, Insightful)
Re: (Score:2)
Well, that is stupid. You NEVER run identical codes on different architectures. Especially when they are not even binary compatible. You almost always optimize the code for a given architecture in fractions of code that are particularly important. Querying cache sizes, checking the number of hardware contexts are common things.
For instance libcairo has some NEON specialized code path. ffmpeg contains codepath for pretty much every single architecture out there.
Re: (Score:3)
You run configure (various options such as --enable-gpl for FFmpeg) && make for each platform. For benchmarking I guess you could do make check for Cairo but that is not a very good test as make check needs exactly the right versions of ghostscrpt, various fonts and I don't know what else. For FFmpeg you could run make fate after downloading the samples and time it. This would be a fairly good C benchmark for various CPU's because as you stated there are code paths for a hell of a lot of CPU's. The
Re: (Score:2)
he probably meant identical in the sense that the input->output is identical. which is what benching two systems should be about anyways.
Re: (Score:1)
Re: (Score:2)
They're rigging results by using parameters and optimizations useful only for the benchmarks in question. In other words, unless the only thing you use processors for is benchmarks, you have learned absolutely nothing about how this processor will work in any real world application.
Re: (Score:2)
It is the suite of tools, not just the processor. If intel offers a better processor/compiler package than is available for arm why shouldn't they tout it?
Because you'll be stuck with the architecture for quite some time, while the SW tools may evolve faster than you think (not to mention the fact that there's always the profiler, compiler intrinsics, and inline assembly, if you need top performance right here, right nor, for a particular piece of code, and then, only your brain and the piece of silicon come into the equation, not some silly compilers).
Re: (Score:1)
If you want to know why they shouldn't present honest results, it looks like you;'re going to have to ask them, because it seems they didn't. Until they explain why, the usual reason people put their thumb on the scale is that they know they can't win honestly.
Re: (Score:3)
It is the suite of tools, not just the processor. If intel offers a better processor/compiler package than is available for arm why shouldn't they tout it? I'm not saying they are presenting it in the correct way, but I do think they have a valid point they want to make. That with Intel you get more than a CPU, you get a heck of a lot of tool expertise. And for some people that is worth something.
Absolute correct, you should judge the combination of processor + commonly used compiler. For example, if Apple built an iPad with an Intel processor, then any iPad app would be built with Clang for ARMv7, Clang for ARMv7s, and Clang for x86_64, and you could directly compare all three versions.
However, you must be careful. You need to check real-life code. If you run identical code 32 times and an opimising compiler figures out you need to do it only once, that's not real-life. If this is what your benc
Re: (Score:2)
Re: (Score:3)
At least Linpack performs actual linear algebra, so coding to that particular test will help some people with real workloads (i.e. scientific software that uses Linpack). It's definitely not representative of everyone's workload, though.
Duplicate? (Score:1)
http://hardware.slashdot.org/story/13/07/12/1558209/new-analysis-casts-doubt-on-intels-smartphone-performance-vs-arm-devices?sdsrc=rel
Re: (Score:3)
Just the controversy. The news, buried at the bottom of the article, is that AnTuTu has a newer version that drops Intel performance back to where it was before.
Fixed, apparently (Score:3, Informative)
In fairness to AnTuTu they released a new version which tries to rectify the problem:
http://www.eetimes.com/author.asp?section_id=36&doc_id=1318894& [eetimes.com]
Re: (Score:2)
yeah but does the new version remove optimizations from the intel compile or(the right way) add those to the arm version?
seriously though.. who gives a fuck. the tests should be done with the usual android toolchain... it's not like anyone is going to use _that_ intel processor for scientific computing.
Anyone remember the days... (Score:3)
...where companies used to rig benchmarks?
Oh right, we're still not past them.
AND WE'LL NEVER BE!
Always use real world applications, in actual, real usage. Never benchmarks.
Usefull (Score:2)
I know some ignorant people that will take these benchmarks as gospel in their righteous views.
Re: This is what Intel's millions of PR spend achi (Score:2)
>high-performance computing
winner, no contest: Intel's best CPU, plus the best GPU money can buy. Why hobble a kick-ass GPU with a second-rate CPU?
> excitement
I don't know about YOU, but I get more excited by maximum performance than by power consumption, cost, or marketing. Winner: Intel
I don't wish AMD ill. I'd *much* rather see AMD pulling ahead of Intel & forcing both into a deathmatch for better performance. Let's not forget that AMD happily showed us that they can be as expensive & mean
Re: (Score:2)
>high-performance computing
winner, no contest: Intel's best CPU, plus the best GPU money can buy. Why hobble a kick-ass GPU with a second-rate CPU?
Because power consumption is very important for HPC...
It might not matter so much for a single user's desktop, but when you scale to thousands of processors the extra power consumption can cost serious amounts of money to keep running both in the power it consumes, and the extra power consumed keeping it cooled.
If your HPC workload primarily uses the GPU, then the CPU may even be sitting idle most of the time, your CPU only needs to be fast enough to keep the GPU fed with data.
Also for HPC, throughput is im
Time for ARM to invest in GCC (Score:3, Insightful)
ARM looks like a sore loser here.
>GCC isn't currently very good at auto-vectorization, but NEON is now standard on every Cortex-A9 and Cortex-A15 SoC
So the conclusion is to remove intel optimizations instead of improving ARM ones?
Re: (Score:2)
Well, no. There are better compilers out there for ARM. Keil for one. More importantly though is the fact that real code that cares about performance won't just write a loop and let the compiler take care of it; they'll use optimized libraries (which both Intel and ARM provide).
Compiler features like auto-vectorization are neat and do improve spaghetti code performance somewhat but anyone really concerned with performance will take Intel's optimized libraries over them. So if we're going to compare performa
Re: (Score:1)
Re: (Score:2)
fix it then... (Score:1)