Arrays vs Pointers in C? 308
UOZaphod asks: "A recent sub-discussion on Slashdot (in which, I confess, I was involved) piqued my curiosity because of several comments made about C compiler optimizations. I was informed that said optimizations have made it so that indexing an array with the [] operator is just as fast as using an incremented pointer. When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?"
"Here are my own thoughts on the issue:
For discussion purposes, I present the following two equivalent functions which reverse the contents of a string. Note that these code fragments are straight C, and do not account for MBCS or Unicode.
The first function uses array indexing:
The second function uses pointers:void reversestring_array(char *str)
{}int head, tail;
char temp;
if (!str) return;
tail = strlen(str) - 1;
for (head = 0; head < tail; ++head, --tail)
{}temp = str[tail];
str[tail] = str[head];
str[head] = temp;
While there are obvious optimizations that could be done for both functions, I wanted to keep them as simple and semantically similar as possible.void reversestring_pointer(char *str)
{}char *phead, *ptail, temp;
if (!str) return;
ptail = str + strlen(str) - 1;
for (phead = str; phead < ptail; ++phead,--ptail)
{}temp = *ptail;
*ptail = *phead;
*phead = temp;
Arguments have been made that the compiler will optimize the first example using register indexing built into the CPU instruction set, so that it runs just as fast as the pointer version.
My argument is that one cannot assume, in a multi-architecture environment, that such optimizations will always be available. Semantically, the expression array[index] must always be expanded to *(array + index) when the index is variable. In other words, the expression cannot be reduced further, because the value of the index is unknown at run time.
Granted, when I compiled the above examples on an x86 machine, the resulting assembly for each of the two functions ended up looking very similar. In both cases, I enabled full compiler optimization (Pentium Pro). I will present just the inner loop for each function...
The array function:
The pointer function:forloop:
mov bl,byte ptr [esi+edx]
mov al,byte ptr [ecx+edx]
mov byte ptr [ecx+edx],bl
mov byte ptr [esi+edx],al
inc esi
dec ecx
cmp esi,ecx
jl forloop
While this example appears to prove the claim that compiler optimizations eliminate the differences between array and pointer usage, I wonder if it would still be true with more complicated code, or when indexing larger structures.forloop:
mov bl,byte ptr [ecx]
mov dl,byte ptr [eax]
mov byte ptr [eax],bl
mov byte ptr [ecx],dl
inc ecx
dec eax
cmp ecx,eax
jb forloop
I'd certainly be interested in hearing more discussion on the matter, accompanied by examples and references."
Why do you care? (Score:4, Insightful)
If it isn't, then don't wank around optimizing for single cycles on a machine that probably bleeds off a million cycles every time you raise a window.
Re:Why do you care? (Score:2)
Re:Why do you care? (Score:5, Funny)
Unless of course that someone writes a compiler so optimizing that the code ends before it begins, causing a paradox that will end the universe. To prevent that imminent danger all programmers must start programming in TI-99/4a Basic right now.
Re:Why do you care? (Score:3, Funny)
Re:Why do you care? (Score:5, Interesting)
then don't wank around optimizing
Dude, best use of the word wank. Ever!
for single cycles on a machine that probably bleeds off a million cycles every time you raise a window
Computers have become more powerful and programmers have become more lazy. It's not strictly true because instead of focussing a lot of time writing efficient code programmers are now focussing a lot of time writing a lot of code to fill bigger machines. That million cycles is wasted doing crap and probably half of doesn't need to be done anyway...
I can still remember the days when machines ran in the sub 10MHz range (yes, 10MHz is 400x slower than today's 4GHz). Software was generally responsive, functional and minimal. Adding a zillion features and eye candy was not considered necessary. Programs were easy to use and intuitive, and did I mention functional and minimal? In the days where "nobody will ever need more than 640k" software was designed and optimised to be small and chew up few cycles.
Now, with RAM and Gigahertz available for next to nix software has just bloated out. It's nice to see a programmer thinking about efficiency/size even if it is purely academic. We should be encouraging that; I know I'd like my applications to work faster and carry less crap than they do currently.
Re:Why do you care? (Score:2)
I'd like to point out another related issue. Inefficiency adherents such as Java lovers always talk about how hardware improvements make the inefficiency go away. Well not if you waste it on bloatware. And at any rate, it doesn't improve the hardware I own. Efficiency will cease to matter as soon as they start buying my hardware and electricity.
Re:Why do you care? (Score:2)
I'd like to point out another related issue. Inefficiency adherents such as Java lovers always talk about how hardware improvements make the inefficiency go away.
Not a dig at the OP rather than an observation:
Faster hardware will not make the inefficiency go away - it's still the same amount inefficient. If something wastes 30% of its time wanking on, it doesn't matter how fast you make it go, it's still wasting 30% of its time wanking on.
Faster hardware just makes minor inefficiencies less noticibl
Re:Why do you care? (Score:3, Insightful)
Faster hardware just makes minor inefficiencies less noticible, so programmers add more minor inefficiencies and apply the same "but faster hardware will make it ok" attitude instead of fixing the real problem!
None of the discussion I've seen so far has touched the real problem, not the wanking on the micro-optimisation (these compiler optimizations are all O(n) in gain) but that so few have any sort of a clue on what algorithms to use and when.
I'm not saying that it's bad to optimize those sort of thing
Re:Why do you care? (Score:3)
Re:Why do you care? (Score:5, Insightful)
For computer programs, you won't gain anything worthwhile by optimizing code that the computer only spends 1% of its time executing. That's not to say you should do a sloppy job, but you should focus on what matters. Microoptimization techniques (those techniques that involve choices of instructions and their orders rather than changing the algorithm that is used) typically yield very small gains. Microoptimization can yield substantial benefits when used properly in heavily used sections of code, but the time involved in trying to microoptimize everything could be better used to work on macrooptimizations or organizing the code to make it more amenable to later modification.
There's no sense in trying to make your program 0.3% faster when you could be finding a way to make it 20% faster instead.
Re:Why do you care? (Score:4, Interesting)
You forgot to mention that keeping an 80x25 column text display updated only required moving around a maximum of 4000 bytes at a time (80x25 = 2000 bytes for the monochrome text buffer + another 80x25 colour buffer for 16 background and 16 foreground colours per text block), and that a sub 10MHz CPU would certainly struggle to animate that at 30 frames per second or higher.
I grew up in the days if Amiga - and they certainly didn't have an 80x25 text console... so your analogy is fundamentally flawed. All of my favourite Amiga software was lightweight, efficient and responsive (except for the rey tracing engines I used, but that did _actually_ have to do some serious calculation). In fact, my Amiga ticked along on a 800x600 screen quite nicely. Your analogy is also flawed becase not all of the screen is updated at any one time; only the parts that have changed. Oh, and 4000 bytes x 30 FPS is 120kb;
My Commodore 64 has a 700kHz processor in it and it can certainly animate the full screen at 25 frames per second albeit at a slightly reduced resolution. My Atari 2600 has a CPU of about the same speed and it was capable of keeping up with 25 frames per second (again, lower resolution) and still running the game engines just fine. Your argument doesn't hold water pal!
My 1920x1200 colour display requires 55296000 bytes (1920x1200*24), which is 13,824 times as much as that 80x25 text display. Now despite my 2GHz CPU only being 200 times faster than the hypothetical 10Mhz CPU, it doesn't struggle at all - not only can it animate the whole screen at 60FPS or more, but it can also calculate positions for thousands of objects at the same time.
What exactly do you do on your 1900x1200 colour display? First, 1900x1200x24 bits is actually 1900x1200x3 bytes (6,840,000 bytes). That is a far cry from your piss-ant 55,296,000 bytes (55MB).
Given your eagerness to quote numbers that are practically meaningless to the point and blatantly inaccurate, and your "calculation of positions of thousands of objects" I suggest you are playing games on Windows.
Again, your numbers are flawed, because my old 80x25 text mode display is still drawn from individual pixels. Those pixels still have to be updated individually by the CPU (in the 10MHz days it was often the main CPU that drew every little dot). 80x25 is drawn on a 640x480 in 16 glorious colours. Now, by your argument, 640x480x2 (your 2 image argument from above) = 614,400 bytes. Therefore, your 1900x1200 screen only requires 12x as many bytes to move about but I really dont' care about those numbers because most of the work is done in the GPU now; not the main CPU.
There are certainly some areas in which the software doesn't seem to have sped up in proportion to the processor speed - but my guess is thats mostly because they ceased being CPU bound years ago, not because the CPU is flat out wasting cycles trying to do the job. That doesn't mean it's not the programmer's fault - but if it is, it's because they decided to block the interface while they waited for a DNS query (or similar bad design decision), not because they used a pointer offset instead of an array dereference, or vice versa.
Software bloat is because programmers can get lazy when the CPU gets fast (the old "oh there'll be better CPUs by the time we release it" excuse). Looking at the power consumption figures for a modern Pentium4 CPU and figuring out that it wastes billions and billions of cycles a day doing work that it should never have needed to do is scary. If you average it out over all the PCs running in the world the amount of energy turned into heat because of sloppy programming alone would be enormous.
What about all the wasted hours waiting for the computer to do something because of some sloppy programmer being willing to waste a few million CPU cycles here and there? It doesn't seem like much at the micro level, but think about it across all the processes that run on your machine of a day and the few
80x25 (Score:3, Interesting)
Re:Why do you care? (Score:2)
Re:Why do you care? (Score:5, Insightful)
When the programming task is something like real-time image processing (computer vision), this kind of thing can make a serious difference. If 90% of your time is spent running these kinds of loops over and over again, an improvement in time will make a real difference on what combination of methods you have time for; or how exhaustively you can search for features during one frame; or what resolution image you are able to work on.
If your code does something nice and graphical where 99% of the time is spent waiting for the user, sure, you're absolutely right. And if your system is doing something inherently bounded - it works until it's done, then it stops and waits until it's time again - then all you need is to make it fast enough and no faster. But there are real-world systems that today, and for the foreseeable future, are bounded by the available processing power and that can always benefit from any improvement in execution time.
Re:Why do you care? (Score:2)
When the programming task is something like real-time image processing (computer vision), this kind of thing can make a serious difference. If 90% of your time is spent running these kinds of loops over and over again, an improvement in time will make a real difference on what combination of methods you have time for;
Hence his question. In
Re: Why do you care? (Score:2)
Here are the big-picture rules of thumb:
It only p
Photoshop plug-ins (Score:4, Informative)
Since this type of code typically iterates over a few hunderd million pixels you'd think that changing such details as array vs. pointer or some other common optimalization technique would have an impact.
It does; it typically shaves about a few tenths of a second off of a 5 minute calculation.
Then again, spending that same amount of time altering the algorithm will usually increase performance in a noticable way.
Nowadays I don't bother optimizing code (usually the compiler does a better job at it anyway) but rather optimize the algorithms. Instead of opening the topic and waiting for a definitive answer on your quest for ultimate performance, you could probably have rewritten the algorithm and gained much more performance you'd ever get this way.
Hundreds of possibilities (Score:2, Insightful)
This question sounds to me like discussions I had ages ago with other programmers, and it was always 1 programmer trying to justify his method of coding a routine over some other, equally valid method.
Pointer arithmetic and array's in C really have the same issues. You can access beyond the array, and you can direct a pointer beyond the correct memory space. If you want to discuss good programming practices, C isn't it.
I never really saw the point in arguing over array and pointers in C. When I programmed i
But not everything is a "modern PC". (Score:2)
Re:But not everything is a "modern PC". (Score:2)
but reading the submission it really looked to me like someone who simply wanted to justify their opinion on something. he wanted a generic answer to a question which requires a specific answer.
if someone here said the pointer method was always faster he would run back to his friends and declare he was right all a long.
the fact is the answer requires knowledge of all the target platforms as well as all of the compilers involved.
pointer method is generally faster and more efficient; but it makes fo
WOW! (Score:2, Insightful)
is this really your bottleneck? (Score:2, Insightful)
One can always check the resulting assembly code, if one is so concerned about this.
Though I'm pretty sure this isn't the performance bottleneck in your code (just remember - profile, profile, profile)
The eighties called... (Score:3, Funny)
Unfortunately, the 2000s answered... (Score:2)
...and said they can't have them because all these new-fangled alternatives suck when you've got serious work to do.
In my experience ... (Score:2, Interesting)
However, for more complicated loops the compiler fails to opimize the loop very well. My 'experience' was in writing an alogirthm which took a raw camera sensor ccd data and ordered it into a proper bayer pattern. When moving from indexer based access to pointer based access I saw a significant increase in performance.
Even basing one pointer off of another was slower than maintaining separate po
It optimizes out (Score:5, Interesting)
Back in the day, we all learned about this because a compiler construction course was required for a comp sci degree.
Re:It optimizes out (Score:2)
Re:It optimizes out (Score:5, Funny)
Hah, yeah. Last week in a public toilet in London, someone drew an arrow pointing to the toilet roll, and it said "Computer degrees - please take one".
Guess that about sums it up
Re:It optimizes out (Score:2)
Yes, it does make a substantial difference in speed with some systems/compilers.
Re:It optimizes out (Score:2, Informative)
The point is now a bit moot since for many loops you do not want either indexing or pointer arithmetic, you want SIMD ins
Re:It optimizes out (Score:3, Interesting)
Then I remembered that this is Slashdot, where the groupthink is that a CS degree is useless and doesn't teach anything you need to know in the real world.
I echo the above statements (Score:5, Insightful)
IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.
Re:I echo the above statements (Score:5, Interesting)
I call bullshit. Optimizations are important regardless of the language or CPU.
My Pentium III test machine with 256Meg of Ram blew away a dual processor Intel system with 1Gig of Ram while parsing a 30Meg XML import/export file.
It took over six hours on the dual processor system with the native
My C program parsed it in a hour and half. And yes, I used pointers. Why? Because its more efficient.
Time is money, especially when your trying to push down 20,000 price changes from the Mainframe to 2000+ POS units during the off hours. The solution? We put my C routine into a shared library callable via C# or Java. Bonus, the 'C' code gets it done under an hour on the dual CPU machine. And yes, I tested the inputs for overflows, security problems whatever, before we went into production. Theres a big difference between a programmer who knows a language vs a programmer who understands it.
IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.
Thats your opinion and I'm glad you shared it. You do know that your C#/Java/VB/Python etc. VM calls all wind up as pointer arithmetic to the CPU? Don't you? I wouldn't want to work for you though. Your competitors will write a faster program that uses less memory and you will loose the contract/job.
No flame intended,
Enjoy.
Re:I echo the above statements (Score:3, Interesting)
I remember one code review where someone told me to use prefix instead of postfix notation, for optimization reasons. Yet it occured in an initialization routine
Re:I echo the above statements (Score:3, Interesting)
Heh, a little offtopic but - This is why I hate XML. It's so bloated. You take 1 to 6 hours parsing a 30 megabyte XML file in C? I was just tasked with parsing out some select data from a 37 gigabyte XML file (870 million lines). I tried all sorts of optimizations and parsers upon finding that it might take days to parse. My solution - 50 lines of perl usin
Re:I echo the above statements (Score:3, Interesting)
One of the more interesting things I've seen is to see how different software developers write a program for the following problem:
Re:I echo the above statements (Score:3, Interesting)
Thinking like this (Score:3, Insightful)
Secondly, these operations can add up. If, for example, this scenario is used throughout the code and called several times per second, on an operation perhaps requiring ready output, the output might even be visible delayed on a faster
Re:I echo the above statements (Score:3, Informative)
You realise that pointer arithmetic and array indices are the same thing, don't you? Ie given:
int b[100];
int i;
The following are equivalent:
b[i++]
*(b + i++)
Arrays *are*
Re:I echo the above statements (Score:3, Funny)
Different methods (Score:2)
In consequence, if you want a predictable program on ANY compiler, pointer operations are probably a better bet than arrays, regardless of how well compilers handle those arrays.
If you want something of uniform reliability, rather than uniform
Re:Different methods (Score:2)
But if the code is unclear, let's say multiple nested loops with a good number of branches, local variables, etc. then an optimizing compiler will probably say WTF and just leave it alone.
As has been said many times, profile your code. If you find the bottle-neck, fix it. If it comes down to pointer vs. array, get rid of th arr
Strength Reduction (Score:4, Informative)
Do This instead (Score:3, Interesting)
{ }
That'll teach ya.
Re:Do This instead (Score:2)
Re:Do This instead (Score:2)
Naturally neither version will actually use any added memory (as in RAM) but I doubt an optimising compiler will be able to perform strength reduction on the XOR operations.
Don't try to out-think your compiler. The compiler is your friend.
there is a difference... (Score:2)
if you look closely at these two portions of the assembl
Re:there is a difference... (Score:2)
Basicly, you'd store a pointer somewhere in memory page 0, and then use the x or y register as an index into whatever it pointed at (ie LDA ($a0),x would take the pointer stored in address a0 and a1, use the x register as an index, and read the contents of the resulting address into the acumulator) . Here it would be faster to manipulate the index and not the pointer.
Pointers were 16 bit (64k address space) and x and y were 8 bit registers.
Re:there is a difference... (Score:4, Informative)
Actually, most x86s have a dedicated address generation units which handle those index additions in parallel on separate logic from the main ALU. So both cases would actually run at the same speed on a modern x86.
Re:there is a difference... (Score:2)
<blockquote>*tail^=*str; *str^=*tail; *tail^=*str;</blockquote>
for the swap, as this will be faster on many more architecures. and its more memory effecient.
There cannot be any difference. (Score:2, Informative)
Every C programmer should know the answer to the following question:
What is 6["abcdefghijkl"]?
Answer: 'g'.
How is this determined?
By definition x[y]=*(x+y)=y[x].
Don't believe me, check the standard.
( Yeah this is a degenerate case, like Duff's device. Still a compiler has to support it. )
In summary... (Score:2)
In other words, "No, you cannot."
Your best bet is to do (almost) what you did - compile a test case per platform, check the code - and then the part that you neglected, you need to count the clock ticks per instruction. To reinforce what one poster said - there can be a large difference between [ecx] and [ecx+edx], conceptually speaking. If it truly matter
Re:In summary... (Score:3, Funny)
Not necessarily. You're ignoring the fact that modern processors use pipelining architectures, branch prediction, extensive caching often at several levels, and a whole host of other things that mean the total time required is not the sum of the individual times for each instruction.
One of these days, someone will invent a program that can take
Re:In summary... (Score:2)
"When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?" is the quesiton.
"You're ignoring the fact that modern processors use pipelining architectures, " you propose.
As I said, the bulk of the posts end up saying, "any good compiler will/should"...
In other words, "No, you cannot."
But even in the case with chip-level optimizations, your point is still moot; for those processors, then it doesn't make any difference for you - call them
differences not having to do with optimization (Score:2)
int *int_ptr = malloc(8 * sizeof(int));
char string[] = "Hello world!";
char *char_ptr = "Hello world!";
sizeof(array) == 8 * sizeof(int);
sizeof(int_ptr) == sizeof(int *);
You can get int_ptr to point to another location in memory (such as doing ++int_ptr); the similar statement (++array) is illegal.
I think that altering individual characters of char_ptr is undefined (as it would result in changing the value of the literal string), while (as above) string cannot be pointed to a new literal string.
Re:differences not having to do with optimization (Score:2)
Nope, sorry, he was right and you're wrong. In the code given, char_ptr points to a string literal, which is automatically constant data and can be stored somewhere completely different to either the stack or the heap. Attempting to change that data results in undefined behaviour.
(For completeness: in C++, you can only even point a (non-constant) char * to a string literal b
your mileage will vary, but... (Score:4, Informative)
have you tried this with intel's c/c++ compiler or other compilers? i'd be curious to see if what you're seeing is a result of how gcc is limited in the number of optimizations it can apply directly to the parse tree because it can't assume (at that stage) a particular target machine.
Re:your mileage will vary, but... (Score:2)
Not true. Not true at all. Multi-D arrays are merely syntactic sugar for single dimension. It doesn't involve two pointer dereferences unless it's a jagged array, aka an array of pointers. A built in C multi-D array
Re:your mileage will vary, but... (Score:2)
Re:your mileage will vary, but... (Score:2)
That's true for jagged arrays, but IIRC, statically allocated multidimensional arrays in C are not jagged. A[i][j] is equivalent to A[i*c+j] (or maybe A[i+j*c], whatever).
Use Java instead .... (Score:2, Funny)
GCC experimental results (Score:5, Interesting)
L13:
movzbl -1(%ebx), %edx
movl %esi, %ecx
decl %edi
movl 8(%ebp), %eax
movb %dl, -13(%ebp)
movzbl -1(%esi,%eax), %edx
movb %dl, -1(%ebx)
decl %ebx
movzbl -13(%ebp), %edx
movb %dl, -1(%esi,%eax)
incl %esi
cmpl %ecx, %edi
jg L13
The loop from the pointer version:
L5:
movzbl 1(%esi), %edx
movl %esi, %ecx
movzbl (%ebx), %eax
movb %al, 1(%esi)
decl %esi
movb %dl, (%ebx)
incl %ebx
cmpl %ecx, %ebx
jb L5
Time to execute the array version 100,000 times on a 10,000 character string: 0m4.515s
Time to execute the pointer version 100,000 times on a 10,000 character string: 0m3.936s
So the pointer version actually generates somewhat faster code with the compiler I used on this example, which surprises me. But there's no substitute for actually testing.
Re:GCC experimental results (Score:3, Interesting)
GCC is designed to be portable, not fast, so the code is generates is often pretty bad compared to specialised, platform-specific compilers. Obviously your test is relevant if GCC is the compiler you'll be using, but for serious performance work it's pretty much irrelevant what GCC generates because no-one uses it when native alternatives are available anyway. In fact, your example code here is a great demonstration of this!
Re:GCC experimental results (Score:2, Insightful)
Re:GCC experimental results (Score:2)
Re:GCC experimental results (Score:2)
Re:GCC experimental results (Score:2, Interesting)
Re:GCC experimental results (Score:3, Insightful)
Aren't hundreds of Linux distributions out there enough to prove that assumtion wrong? Gcc is used regardless of its speed because it's free.
Re:GCC experimental results (Score:2)
Re:GCC experimental results (Score:3, Informative)
Unfortunately, the article didn't say what compiler he was using. But since we're giving data points:
gcc 3.4.2 3.4.2 [FreeBSD] 20040728, x86, -O3 -march=pentium-m. Generated essentially the same code as the article's.
Array version:
Pointer code:
And C++/STL? (Score:5, Informative)
Seriously though, there is no generalized answer. Good compilers will do what you want. Bad compilers (and there are more than you realize out there) will make lousy code.
If your target involves an environment where you might be using a more primitive compiler, or you can't predict the environment and compiler, it might be an issue. This is why code like the PNG and JPEG libs go for tight/cryptic. As well, the performance of the runtime platform (CPU, memory, bus) have bearing. In some cases, though one piece of assembly might look less efficient than the other, the sheer brute force of CPU parallelism, out of order execution and other black juju might render it meaningless.
Finally, you have to consider the cost/benefit of the situtation. Making cryptic fast code is worthwhile if you're writing some wicked FFT code or image processor main loop, where it'll run a few quadrillion times. Other places in the same codebase though, it's probably worthwhile to trade absolute performance for a bit better code readability and maintainability.
Remember, profile before optimizing. Only optimize things that will really make a significant performance difference. Rewriting the UI display loop to use pointers instead of lists is probably pointless. Heh. Pointless.
I'm a big fan of C++ STL containers now. If I _know_ a block of code is going to be a critical bottleneck, I'll use something else from the start. I've known people who coded UIs in assembly, for no good reason, and others who wrote image processing code in interpreted RAD scripting environments. I've written and optimized code (C++/C and assembly) on systems all the way back to the 6502 (yay! two 8-bit index registers!) and it's not hard, as long as you proceed sensibly and logically.
That being said, the Microsoft VC++6 compiler (still in common use today) has a terrible code generator. It fails to perform simple loop invariant hoisting operations that my old SAS/C++ compiler (Amiga, yah, I'm one of _them_...) did in 1990. VC++7/2003 and Whidbey/2005 are showing signs of being MUCH more caught-up, and the Intel and Codeplay compilers (despite Intel's AMD-phobia) are much better too.
When performance really counts, a whole new set of tools and processes come into play.
Re:And C++/STL? (Score:2, Funny)
ON THE OTHER HAND, the commercial proprietary alternative I'm competing against loads the equivalent data. But that data consumes 25Megs, and takes five seconds to load! You would think i
Re:And C++/STL? (Score:3, Informative)
Now, the interesting part is what the optimizing compiler outputs for each of the three vairants (I only include the inner loop):
use array (Score:2)
These days if its something stupid like using one syntax over another, that optimization has been built into the compiler. The only real speed enhancements are going to be algorithmic. You know the old thigns that make sense, take stuff out of loops that dont need to be inside, compiler
TFA is a Troll (Score:2)
If you answer this question, you fuel a discussion that should be constrained to a very very small domain of C development. Otherwise, you're going to see needless flag waving about coding standards in domains where it doesn't need to exist. Don't feed the dirty C programmer (we like other info, really, we do).
and frankly, the TFA depends on architecture, where most modern compilers make both snippets equal.
As Donald Knuth said... (Score:4, Insightful)
Premature optimization is the root of all evil.
I personally let the compiler writers worry about this type of thing. I'd rather have my code be more readable than fast. That being said, there are many times that I switch back and forth between pointer arithmetic and array indexing within the same program. I'll primarily use the pointer arithmetic for things like simple string processing where it just leads to more compact code, and I'll use the array indexing when I have an actual bonafide array...
In any event, my point is that you should be programming in a way that is maintainable and readable; you shouldn't be worried about shaving tens of clock cycle off of such simple loops. In more complex loops, you probably will not be able to shave nearly as much time off, because your indexes won't always be sitting in a register and the data that the index points to will most likely not even fit into a register. In this case, I don't think anyone will argue with the assertion that this:
is less readable than this:
Re:As Donald Knuth said... (Score:2)
Re:As Donald Knuth said... (Score:2)
Pointers are for programmers silly rabbit!! (Score:2, Insightful)
You shouldn't make any assumptions ... (Score:3, Insightful)
Honestly, there is no real justification for making any assumptions. It depends on how good a particular C compiler is at generating code for a particular ISA. Indeed, for some ISA's I'd expect a niave C compiler to generate FASTER code for indexing an array than for incrementing a pointer. (I'm thinking of word addressed ISAs, where a 'char*' has to be represented as a word pointer and a character offset.)
In the big picture, it probably doesn't make a great deal of difference to performance which style you use. It might make 5-10% difference in a tight loop, but probably much less across a large application. If the difference performance is significant, you will get more benefit for your effort by:
As other people have said, other issues than this are likely to have a greater bearing on the overall performance of a typical application; e.g. data structures, algorithms, database design, etc.
Vectorisation (Score:3, Insightful)
I'm not sure about various compilers and what they do in this case, but following the progress of GCC4's vectorisation, it looks much more likely that the pointer case is passed over by the vectorizer and ends up being (way) slower than the easy to vectorise array index version.
Like I said, not sure what the actual situation in practise is, but it's worth looking into. The difference between vectorized SSE code and plain old x86 code (for example) is WAY greater than the trivial insignificant difference between the two examples you posted.
ASSume (Score:3, Insightful)
You're right, however, you're also assuming that your pointer arithmetic is faster.
Consider a 16-bit architecture with 32-bit pointers.
Using pointer arithmetic (32-bit) is slower than using an index register (16-bit) as the array index.
So stop assuming and stick with what you're comfortable with. If you prefer pointers, fine. if you prefer arrays, fine. But if you're so concerned with the speed, you'd be doing it in assembly.
From the Compiler Trenches (Score:5, Informative)
I'll admit that I'm always slightly bemused by these sorts of discussions. That bemusement quickly turns to disiniterest after I realize that a lot of people are burning a lot of cycles arguing about it.
Quite frankly, your example has little relevance in the real world. The optimization you are talking about is covered by strength reduction, as others have pointed out. But that's not the point of my message. This sort of piddly optimization means almost nothing when one looks at the whole application. If this piece of code is in an inner loop that takes 90% of the application time and it proves to be the bottleneck, then one can think about taking a closer look.
We have customers that come to us all the time with just such examples. They literally tell us, "You missed an opportunity to use a register here," and we know it's important because we know the customers are serious about profiling code and finding bottlenecks. So when they come to us we are happy to look at the "piddly stuff."
I've seen all kinds of different code. They basically fall into two categories: code that the compiler can do something significant with and code the compiler has no hope of automatically optimizing in any truly meaningful way. When I say "significant" and "meaningful," I explicitly mean "not Dragon Book stuff, except for scheduling (which can provide a significant performance win)" Simple optimizations like common subexpression elimination and copy propagation are more useful at enabling other optimizations than in any cycles gained in their own right. They are important, but not to make the code run significantly faster in and of themselves.
If one writes an application that does a lot of branching and pointer chasing (say, like, a traditional compiler*) then there's not much an optimizer can do with it. The aliasing difficulties alone will kill most optimizations. It's more important to write these kinds of applications in an understandable way because that is where the programmer time is most costly.
That said, judicious use of directives for compilers that support them can go a long way toward making these kinds of codes run really fast. Think of threading a tree search, for example. But the compiler is not going to have much hope of converting such a low-level piece of code without help.
An example of code that a compiler can do really, really well on are the traditional scientific applications. Here parallelism is everything, be it data-level, thread-level or at the distributed memory level (for really big machines). In these cases the loop optimizations are orders of magnitude more important speed-wise than sequential optimization (though, as I said, sequential optimization can enable some of these loop transformations). Some of the more important loop restructuring transformations are:
When the compiler is done with these, one hardly recognizes the code anymore. :)
In my experience, the fundamental problem is that compilers are really hard to understand. People argue about what a compiler can and cannot do because they are enormously complex systems that require arcane knowledge of language standards and hardware architecture to really dig into. It's slightly less difficult to understand the broad strokes, for example, simple cases of vectorizable loops. It's a lot more difficult to understand how to parallelize a loop that compresses a sparse array into a dense one.
If there's one lesson that I like to convey to programmers, it's to not sweat the optimization details. Don't hand-optimize the code. I can tell you fro
Benchmarking (Score:2, Insightful)
Pointer aliasing (Score:3, Interesting)
If you want to learn more about these kind of source level optimisations, look the AMD Athlon(TM) Processor x86 Code Optimization Guide is a good reference -- it includes a section on why you want to use array accesses instead of pointers, and also has a lot more up-to-date and useful advice on what compilers do to your code. It is available from AMD's website.
Where have all the geeks gone? (Score:4, Interesting)
No one will probably read this comment because its been a day since the OP, but I'm amazed at the quantity of people who are slamming this guy for wanting to research something that's admittedly interesting.
For starters, if the submitter is a CompSci student then he definately gets my kudos. Too many CS students are just focused on "I wanna learn C# so I can go make money!" as opposed to actually LEARNING.
Secondly, what happened to just plain geekiness of research and studying things because its fun and interesting? Does everything we do have to have some specific applicable purpose? If you say yes, you are thinking like the MBAs that always get bashed around here instead of a real nerd.
Who knows? Its unlikely, but possible that thinking about this problem somehow leads to a train of thought that solves P=NP or something.
Re:Hmm.. (Score:3, Informative)
I'd certainly be interested in hearing more discussion on the matter, accompanied by examples and references
Preview! Gah.
Re:Hmm.. (Score:2)
Re:Hmm.. (Score:2)
The unix philosophy lives on: if the system is cumbersome and user unfriendly, it's the user's fault!
Re:Hmm.. (Score:2)
Re:Hmm.. (Score:2)
Sure it would be nice with a "allow edit withing 3 minutes" function which can be found on WWW boards. OTOH Slashdot generates several times the traffic as other sites with webboards.
Re:Hmm.. (Score:2)
Xnews? Bah! (Score:2)
Re:Here's the answer (Score:2)
Re:Professor Cormen said... (Score:4, Informative)
First:
``int i[20] followed by int *k = i, then i[4] is the same as *(k + (4 * 4))''
You're trying to get the 5th element of the array by using an offset of 4 times 4, assuming sizeof(int) == 4. First off, don't make that assumption; always write sizeof(int) when that's what you mean. Secondly, the C compiler automatically multiplies your offset by the size of the elements, so you would have to write *(k + 4) instead.
Secondly, you're not getting the point (probably, you were misled by the headline, as I was). The question is not whether variables holding arrays are really holding pointers to the arrays (they are), but whether, say, iterating through an array by updating a pointer is faster than iterating by using an index variable as an offset. In other words, it's not whether a[0] is the same as *a, but whether while(*a) a++; is faster than while(a[i]) i++;.
Re:Professor Cormen said... (Score:2)
Ok, I'll elaborate.
When you say while(a[i]) i++;, you do two adds each cycle; one to increment i, and one to add i to the address of a. When you say while(a) a++;, you do only one add per cycle. The OP's question is whether optimizing compilers get rid of one of the adds, or not.
This is different from asking which of a[4] and *(a + 4) is faster. In that case, the loops would have looked like while(a[i]) i+=; and while(*
Spelling and grammar and punctuation, oh my! (Score:5, Funny)
Re:Effective cache use will be a better optimisati (Score:2)
And if you can't find/figure out a good algorithm yet, don't do really strange stuff. That way someone later can more easily figure out how to fix your code.
Re:Effective cache use will be a better optimisati (Score:2, Funny)
The previous writer of the function had carefully removed all the array accesses, only using pointers that were incremented by fixed constants as it proceeded through the code. He had also carefully maintained an array of the sums of the results as we moved from pixel
Re:Register indexing. (Score:2)
I don't see any register indexing in IA64. No immediate offset, either. If you want to access an address, you'd better have that address - exactly - in a register. At least you get enough registers.
Incidentally, the IBM S/360 was an early machine (perhaps the first) with base+index+offset addressing. I've heard that the three-way adder required was a performance hit. I know that some models had instructions which were fa