Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Arrays vs Pointers in C?

Cliff posted more than 8 years ago | from the optimizations-may-not-be-portable dept.

Programming 308

UOZaphod asks: "A recent sub-discussion on Slashdot (in which, I confess, I was involved) piqued my curiosity because of several comments made about C compiler optimizations. I was informed that said optimizations have made it so that indexing an array with the [] operator is just as fast as using an incremented pointer. When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?""Here are my own thoughts on the issue:

For discussion purposes, I present the following two equivalent functions which reverse the contents of a string. Note that these code fragments are straight C, and do not account for MBCS or Unicode.

The first function uses array indexing:

void reversestring_array(char *str)
{
int head, tail;
char temp;
if (!str) return;
tail = strlen(str) - 1;
for (head = 0; head < tail; ++head, --tail)
{
temp = str[tail];
str[tail] = str[head];
str[head] = temp;
}
}
The second function uses pointers:
void reversestring_pointer(char *str)
{
char *phead, *ptail, temp;
if (!str) return;
ptail = str + strlen(str) - 1;
for (phead = str; phead < ptail; ++phead,--ptail)
{
temp = *ptail;
*ptail = *phead;
*phead = temp;
}
}
While there are obvious optimizations that could be done for both functions, I wanted to keep them as simple and semantically similar as possible.

Arguments have been made that the compiler will optimize the first example using register indexing built into the CPU instruction set, so that it runs just as fast as the pointer version.

My argument is that one cannot assume, in a multi-architecture environment, that such optimizations will always be available. Semantically, the expression array[index] must always be expanded to *(array + index) when the index is variable. In other words, the expression cannot be reduced further, because the value of the index is unknown at run time.

Granted, when I compiled the above examples on an x86 machine, the resulting assembly for each of the two functions ended up looking very similar. In both cases, I enabled full compiler optimization (Pentium Pro). I will present just the inner loop for each function...

The array function:
forloop:
mov bl,byte ptr [esi+edx]
mov al,byte ptr [ecx+edx]
mov byte ptr [ecx+edx],bl
mov byte ptr [esi+edx],al
inc esi
dec ecx
cmp esi,ecx
jl forloop
The pointer function:
forloop:
mov bl,byte ptr [ecx]
mov dl,byte ptr [eax]
mov byte ptr [eax],bl
mov byte ptr [ecx],dl
inc ecx
dec eax
cmp ecx,eax
jb forloop
While this example appears to prove the claim that compiler optimizations eliminate the differences between array and pointer usage, I wonder if it would still be true with more complicated code, or when indexing larger structures.

I'd certainly be interested in hearing more discussion on the matter, accompanied by examples and references."

cancel ×

308 comments

Sorry! There are no comments related to the filter you selected.

Hmm.. (-1, Offtopic)

Surye (580125) | more than 8 years ago | (#13777393)




Hmm... You must be new here.

Re:Hmm.. (2, Informative)

Surye (580125) | more than 8 years ago | (#13777414)

Oops, messed up the tag at the top, and it ate the quote portion:

I'd certainly be interested in hearing more discussion on the matter, accompanied by examples and references

Preview! Gah.

Re:Hmm.. (1)

Strike (220532) | more than 8 years ago | (#13777657)

Preview! Gah.
You must be new here.

Re:Hmm.. (1)

Alomex (148003) | more than 8 years ago | (#13778264)

Of course, because a feature such as "edit" is only for wimps...

The unix philosophy lives on: if the system is cumbersome and user unfriendly, it's the user's fault!

Re:Hmm.. (1)

cortana (588495) | more than 8 years ago | (#13778486)

I think that people making idiots out of themsevles is much more preferable to trolls editing their +5 moderated posts into Goatse ascii art.

Re:Hmm.. (1)

Hast (24833) | more than 8 years ago | (#13778580)

There is no edit function on Slashdot because down that road lies edit wars and other problems. (Specifically there are issues with editing + moderation as the mod should reflect the original article and not the edit.)

Sure it would be nice with a "allow edit withing 3 minutes" function which can be found on WWW boards. OTOH Slashdot generates several times the traffic as other sites with webboards.

Re:Hmm.. (1)

Alomex (148003) | more than 8 years ago | (#13778632)

or how about an "editing undoes the moderation" feature, just as posting to a thread one has moderated undoes any moderation one might have done therein....

Why do you care? (4, Insightful)

Julian Morrison (5575) | more than 8 years ago | (#13777402)

For any real programming task, the question has to be: why do you care baout that? Is it, specifically, a bottleneck in your code as detected with profiling tools?

If it isn't, then don't wank around optimizing for single cycles on a machine that probably bleeds off a million cycles every time you raise a window.

Re:Why do you care? (1, Redundant)

Macphisto (62181) | more than 8 years ago | (#13777455)

Yeah, no kidding. Also, the discussion is pretty meaningless unless the compiler is considered, target architecture, etc. It's a safe bet that this sort of semantically equivalent code has generated assembly that has no performance advantage either way for any compiler made in like the last bazillion years. Oh yeah, who writes C code anymore?

(me... so lonely in here...)

Re:Why do you care? (1)

Blakey Rat (99501) | more than 8 years ago | (#13777688)

I agree. Both of those pieces of code happen so fast that it doesn't matter. Unless all your program does is reverse lists over and over and over...

Re:Why do you care? (0)

Anonymous Coward | more than 8 years ago | (#13777765)

RTFA
this is just an example. he isnt interested in the performance of these specific statements. he wants to know which method is faster in general.

Re:Why do you care? (1)

jguthrie (57467) | more than 8 years ago | (#13778458)

he wants to know which method is faster in general.

That's like asking "What's the sound of yellow". "In general" the question is not answerable. In my opinion, the question is also misguided. This sort of thing is what Hoare was talking about when he said "Premature optimization is the root of all evil". Making the program understandable and maintainable is far more important than trying to squeeze out every last ounce of performance.

Oh, and Mr. Macphisto, I still write C code. I write a lot of C code. I write in quite a number of other languages, too, some of my own divising, but C is a goodly chunk of it.

Re:Why do you care? (4, Funny)

aled (228417) | more than 8 years ago | (#13778122)

Both of those pieces of code happen so fast that it doesn't matter.

Unless of course that someone writes a compiler so optimizing that the code ends before it begins, causing a paradox that will end the universe. To prevent that imminent danger all programmers must start programming in TI-99/4a Basic right now.

Re:Why do you care? (4, Interesting)

thegrassyknowl (762218) | more than 8 years ago | (#13778087)

then don't wank around optimizing

Dude, best use of the word wank. Ever!

for single cycles on a machine that probably bleeds off a million cycles every time you raise a window

Computers have become more powerful and programmers have become more lazy. It's not strictly true because instead of focussing a lot of time writing efficient code programmers are now focussing a lot of time writing a lot of code to fill bigger machines. That million cycles is wasted doing crap and probably half of doesn't need to be done anyway...

I can still remember the days when machines ran in the sub 10MHz range (yes, 10MHz is 400x slower than today's 4GHz). Software was generally responsive, functional and minimal. Adding a zillion features and eye candy was not considered necessary. Programs were easy to use and intuitive, and did I mention functional and minimal? In the days where "nobody will ever need more than 640k" software was designed and optimised to be small and chew up few cycles.

Now, with RAM and Gigahertz available for next to nix software has just bloated out. It's nice to see a programmer thinking about efficiency/size even if it is purely academic. We should be encouraging that; I know I'd like my applications to work faster and carry less crap than they do currently.

Re:Why do you care? (0)

Anonymous Coward | more than 8 years ago | (#13778186)

Amen

In my experience (those old days..) it was the transition from C to C++ that started the bloat.

Re:Why do you care? (1)

AuMatar (183847) | more than 8 years ago | (#13778667)

You are officially my hero now.

I'd like to point out another related issue. Inefficiency adherents such as Java lovers always talk about how hardware improvements make the inefficiency go away. Well not if you waste it on bloatware. And at any rate, it doesn't improve the hardware I own. Efficiency will cease to matter as soon as they start buying my hardware and electricity.

Re:Why do you care? (1, Interesting)

(1+-sqrt(5))*(2**-1) (868173) | more than 8 years ago | (#13778963)

Well not if you waste it on bloatware.
A great analog of the expansionist principle is freeways: natural law dictates that usage, unhindered by external constraints, will rise to meet capacity.

Whereas C's ghost is parsimony and piety, décadence haunts the house of Java.

Re:Why do you care? (1)

Reality Master 101 (179095) | more than 8 years ago | (#13778468)

*sigh* And some people wonder why in the age of 3GHz processors, they're still only as responsive as 33Mhz processors.

Re:Why do you care? (5, Insightful)

JanneM (7445) | more than 8 years ago | (#13778761)

For any real programming task, the question has to be: why do you care baout that? Is it, specifically, a bottleneck in your code as detected with profiling tools?

When the programming task is something like real-time image processing (computer vision), this kind of thing can make a serious difference. If 90% of your time is spent running these kinds of loops over and over again, an improvement in time will make a real difference on what combination of methods you have time for; or how exhaustively you can search for features during one frame; or what resolution image you are able to work on.

If your code does something nice and graphical where 99% of the time is spent waiting for the user, sure, you're absolutely right. And if your system is doing something inherently bounded - it works until it's done, then it stops and waits until it's time again - then all you need is to make it fast enough and no faster. But there are real-world systems that today, and for the foreseeable future, are bounded by the available processing power and that can always benefit from any improvement in execution time.

Here's the answer (1, Offtopic)

Anonymous Crowhead (577505) | more than 8 years ago | (#13777472)

Stop hunting for zebras. There aren't any.

Re:Here's the answer (1)

cant_get_a_good_nick (172131) | more than 8 years ago | (#13777687)

forgive my inference, can anyone tell me the reference for this?

Re:Here's the answer (1)

Anonymous Crowhead (577505) | more than 8 years ago | (#13777720)

Basically, don't look for exotic causes to your problem. i.e. if your code doesn't do what you think it should be doing, thinking that there must be a problem with gcc (or whatever compiler you use.)

Re:Here's the answer (1)

hackwrench (573697) | more than 8 years ago | (#13778269)

I know a counter-example reference
HHG2G: Some scientist proved that black was white and subsequently got ran over at the next zebra crossing.

A Google search that a "zebra crossing" is british for a pedestrian crosswalk.

Hundreds of possibilities (2, Insightful)

topham (32406) | more than 8 years ago | (#13777473)


This question sounds to me like discussions I had ages ago with other programmers, and it was always 1 programmer trying to justify his method of coding a routine over some other, equally valid method.

Pointer arithmetic and array's in C really have the same issues. You can access beyond the array, and you can direct a pointer beyond the correct memory space. If you want to discuss good programming practices, C isn't it.

I never really saw the point in arguing over array and pointers in C. When I programmed in C I used both. I typically used Arrays if I wanted to code to be obvious and straight forward; if I wanted to do somehting else with the index, etc. I used points if I wanted speed above all else.

If I were to write to any modern PC (PPC970, Pentium IV, Athlon, etc...) I wouldn't worry about speed. The algorithm for more complicated functions than shuffling a few bytes around will dictate the speed.

But not everything is a "modern PC". (1)

Roadkills-R-Us (122219) | more than 8 years ago | (#13777753)

There are still tons of smaller, slower, "antiquated" processors embedded in all sorts of things, for which code gets written. The Z80 is still alive and well, believe it or not, along with lots of other, old standards (and newer, but still small and slow, CPUS).

Re:But not everything is a "modern PC". (1)

topham (32406) | more than 8 years ago | (#13778827)

very true.

but reading the submission it really looked to me like someone who simply wanted to justify their opinion on something. he wanted a generic answer to a question which requires a specific answer.

if someone here said the pointer method was always faster he would run back to his friends and declare he was right all a long.

the fact is the answer requires knowledge of all the target platforms as well as all of the compilers involved.

pointer method is generally faster and more efficient; but it makes for difficult to read code which can be very difficult to debug, or modify in the future. particularly after the code hasn't been looked at for some period of time. unless the particular code is execute a large majority of the time there is no reason to make it more difficult to read and modify.

please excuse the complete lack of caps. synergy seems to have decided my mac doesn't need capitals

There are more appropriate places to ask this... (0, Troll)

afroborg (677708) | more than 8 years ago | (#13777493)

Such as comp.lang.c.moderated [google.co.nz]

Do the kids still know what USENET is?

(XNews is still the best newsreader BTW...)

Xnews? Bah! (1)

jd (1658) | more than 8 years ago | (#13777530)

There is nothing to beat xvnews - particularly the icon that shows the economy collapsing whenever there is anything new to read.

WOW! (2, Insightful)

Anonymous Coward | more than 8 years ago | (#13777514)

Now THIS is the kind of discussion that should be going on at Slashdot!

Re:WOW! (0)

Anonymous Coward | more than 8 years ago | (#13777632)

Now THIS is the kind of discussion that should be going on at Slashdot!

What are talking about? Almost all Ask Slashdots are like this. A retarded question accepted by a retarded editor who, being retarded himself, didn't realize what a retarded question the retarded submittor posed.

Re:WOW! (0)

Anonymous Coward | more than 8 years ago | (#13777871)

Yeah right, there are more idiots here than anywhere else. Lots of wannabe programmers and such hanging out at this place. This kind of discussion has no place here, no one will be able to have an intelligent converation.

Professor Cormen said... (1)

Evro (18923) | more than 8 years ago | (#13777516)

One of the mantras hammered into my head during my freshman CS class was:

"The name of an array is a pointer to its 0th element."

Another favorite was:

"Never dereference a null pointer."

From what I remember, arrays and pointers are interchangeable. I've forgotten the C syntax, but I believe that if you have int i[20] followed by int *k = i, then i[4] is the same as *(k + (4 * 4)). E.g. they're both references to specific locations in memory. I don't know about differences across architectures, but I'd imagine if that didn't hold true on a particular system then that system wouldn't really be implementing C correctly. Then again, I'm not much for languages.

Re:Professor Cormen said... (3, Informative)

RAMMS+EIN (578166) | more than 8 years ago | (#13777779)

You're not completely right.

First:

``int i[20] followed by int *k = i, then i[4] is the same as *(k + (4 * 4))''

You're trying to get the 5th element of the array by using an offset of 4 times 4, assuming sizeof(int) == 4. First off, don't make that assumption; always write sizeof(int) when that's what you mean. Secondly, the C compiler automatically multiplies your offset by the size of the elements, so you would have to write *(k + 4) instead.

Secondly, you're not getting the point (probably, you were misled by the headline, as I was). The question is not whether variables holding arrays are really holding pointers to the arrays (they are), but whether, say, iterating through an array by updating a pointer is faster than iterating by using an index variable as an offset. In other words, it's not whether a[0] is the same as *a, but whether while(*a) a++; is faster than while(a[i]) i++;.

Re:Professor Cormen said... (1)

CableModemSniper (556285) | more than 8 years ago | (#13777937)

you mean:

while(a[i++]);
vs.
while( *(a++) );

*wink*

Re:Professor Cormen said... (1)

Evro (18923) | more than 8 years ago | (#13778498)

I'd actually typed sizeof(int) but I wasn't sure if that was valid, or if you had to do sizeof(i). Regarding the multiplication, I think I may have been thinking of malloc().

As for your second point, I don't see how that's at all different from the first one. It seems like the question is akin to asking whether it takes more time to compute 2+2 or 1+3.

Re:Professor Cormen said... (1)

Bill Dog (726542) | more than 8 years ago | (#13778863)

I'd actually typed sizeof(int) but I wasn't sure if that was valid, or if you had to do sizeof(i).

You can do either. Although "sizeof" is an operator, not a function, so you don't need parens to use it. So you do "sizeof i". Or "sizeof (int)", where "(int)" is a typecast allowing you to use a type instead of a variable with "sizeof".

As for your second point, ...akin to asking whether it takes more time to compute 2+2 or 1+3

It's akin to asking whether it takes more time to compute pa + sizeof *a or a + i * sizeof *a.

is this really your bottleneck? (2, Insightful)

eyal (774028) | more than 8 years ago | (#13777528)

When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?

One can always check the resulting assembly code, if one is so concerned about this.

Though I'm pretty sure this isn't the performance bottleneck in your code (just remember - profile, profile, profile)

lol u r stupid lol (-1, Troll)

Anonymous Coward | more than 8 years ago | (#13777544)

Welcome to 1980. You are a moron. Give up on programming and sell newspapers.

The eighties called... (3, Funny)

Anonymous Coward | more than 8 years ago | (#13777553)

...and they want their arrays, pointers and C back.

Unfortunately, the 2000s answered... (1)

Anonymous Brave Guy (457657) | more than 8 years ago | (#13778281)

...and said they can't have them because all these new-fangled alternatives suck when you've got serious work to do.

Re:Unfortunately, the 2000s answered... (0)

Anonymous Coward | more than 8 years ago | (#13778352)

Hah. I guess that's why there's so many web apps written in C these days, right?

Oh wait, there isn't. Fucking elitist dipshit.

Re:Unfortunately, the 2000s answered... (0)

Anonymous Coward | more than 8 years ago | (#13778390)

Yeah... like APACHE

Oh wait, there are. Fucking ignorant dipshit.

Re:Unfortunately, the 2000s answered... (0)

Anonymous Coward | more than 8 years ago | (#13778420)

web apps? we were talking about *serious* work.

bloody script kiddies.

Re:Unfortunately, the 2000s answered... (1)

UOZaphod (31190) | more than 8 years ago | (#13778563)

Isn't the entire Linux kernel written in C?

I wasn't talking about database front ends and other softcore programming when I submitted the article.

Re:Unfortunately, the 2000s answered... (1, Insightful)

Anonymous Coward | more than 8 years ago | (#13778513)

all these new-fangled alternatives suck

Right, that's why I use Common Lisp.

In my experience ... (1, Interesting)

Keeper (56691) | more than 8 years ago | (#13777555)

In my experience, which is somewhat dated (by about 5 years), what you state is generally true for simple loops.

However, for more complicated loops the compiler fails to opimize the loop very well. My 'experience' was in writing an alogirthm which took a raw camera sensor ccd data and ordered it into a proper bayer pattern. When moving from indexer based access to pointer based access I saw a significant increase in performance.

Even basing one pointer off of another was slower than maintaining separate pointers and manipulating them via arithmatic.

The loop looked pretty nasty when it was done (the loop also mixed several other stages of the image processing pipe at once, such as application of color correction & calibration data, dead pixel correction, etc; probably had something like 30 pointers that had to be maintained), but it was quick.

I ended up taking following the same approach converting the raw ccd data to a live preview and code which generated a thumnail preview, obtaining the same kind of gains.

When I was done with the thumnail code, it was capable of generating 108 fully processed thumnails (color 768x512 8bpc image) per second on a 700mhz PIII from a 6 megapixel source image (3072x2048, 16bpc bayer image). I was proud of that at the time...

It optimizes out (5, Interesting)

klossner (733867) | more than 8 years ago | (#13777571)

An optimizing compiler, such as gcc -O, will rearrange the array code into the pointer code -- it doesn't require a base-index address mechanism. This is called strength reduction [wikipedia.org] .

Back in the day, we all learned about this because a compiler construction course was required for a comp sci degree.

Re:It optimizes out (1)

SilverspurG (844751) | more than 8 years ago | (#13777863)

Back in the day, we all learned about this because a compiler construction course was required for a comp sci degree.
Kudos. :) My hat's off to you.

Re:It optimizes out (4, Funny)

LSD-OBS (183415) | more than 8 years ago | (#13778077)

Back in the day, we all learned about this because a compiler construction course was required for a comp sci degree

Hah, yeah. Last week in a public toilet in London, someone drew an arrow pointing to the toilet roll, and it said "Computer degrees - please take one".

Guess that about sums it up ;-)

Re:It optimizes out (1)

Detritus (11846) | more than 8 years ago | (#13778550)

In the "real world", not all compilers are that intelligent.

Yes, it does make a substantial difference in speed with some systems/compilers.

I echo the above statements (4, Insightful)

21chrisp (757902) | more than 8 years ago | (#13777573)

These types of opimizations are virtually pointless on modern machines. The increased readability and lower likelihood of programming errors on the array option far outway any speed increase for the pointer option. Plus, as you noticed, the resulting assembly is basically the same. Most likely, both will run at virtually the same speed with modern compilers. Not only would the speed difference be unnoticeable, it would be utterly inconsequential.

IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.

Rubbish. (1)

idries (174087) | more than 8 years ago | (#13777833)

IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.

Obviously you have never worked with low-power architectures or non-mainstream compilers. While this particular example may well be too simple to justify using pointer arithmetic in any environment, in a more complex example that type of thing may well be justified (espically on more exotic architectures).

Just expecting compilers to do a good job is no substitute for being able to ensure that they do and being able to do a good job for them if they don't.

Re:I echo the above statements (4, Interesting)

NullProg (70833) | more than 8 years ago | (#13778357)

These types of opimizations are virtually pointless on modern machines.
I call bullshit. Optimizations are important regardless of the language or CPU.

My Pentium III test machine with 256Meg of Ram blew away a dual processor Intel system with 1Gig of Ram while parsing a 30Meg XML import/export file.

It took over six hours on the dual processor system with the native .Net and Java XML parsers. And yes, the original programmers tried several different methods/libraries to tweak the code (Sax, Xerces, whatever). They got it down to a best of four and a half hours.

My C program parsed it in a hour and half. And yes, I used pointers. Why? Because its more efficient.

Time is money, especially when your trying to push down 20,000 price changes from the Mainframe to 2000+ POS units during the off hours. The solution? We put my C routine into a shared library callable via C# or Java. Bonus, the 'C' code gets it done under an hour on the dual CPU machine. And yes, I tested the inputs for overflows, security problems whatever, before we went into production. Theres a big difference between a programmer who knows a language vs a programmer who understands it.

IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.
Thats your opinion and I'm glad you shared it. You do know that your C#/Java/VB/Python etc. VM calls all wind up as pointer arithmetic to the CPU? Don't you? I wouldn't want to work for you though. Your competitors will write a faster program that uses less memory and you will loose the contract/job.

No flame intended,
Enjoy.

Re:I echo the above statements (2, Interesting)

Arandir (19206) | more than 8 years ago | (#13778988)

There are optimizations and then there are optimizations. If you run this particular loop once, or twice, it won't matter. Run it ten times or even a hundred, it won't matter. In these cases it would be a pointless optimization. Profile your software before you optimize. That way you can optimize where you need it and not waste time where you don't.

I remember one code review where someone told me to use prefix instead of postfix notation, for optimization reasons. Yet it occured in an initialization routine in a background thread that would run once per day. That's like worrying about a memory leak in a power-off interrupt handler...

Arrays vs. Pointers determining speed (0)

Anonymous Coward | more than 8 years ago | (#13777583)

A[0] is a pointer to a byte in memory. A[1] is a pointer to that byte + 1 * (size of an element of that type) in memory. So shouldnt this be irrelivent and both take equal time.

Either way, both running times for assesing memory is in n time, so it doesnt matter.

Different methods (1)

jd (1658) | more than 8 years ago | (#13777586)

Different compilers will optimize abstractions in different ways, so you are not guaranteed to get the same code when using arrays. Pointer operations should(!) compile to essentially the same code on ANY compiler, as the code is already about as reduced as it can get.


In consequence, if you want a predictable program on ANY compiler, pointer operations are probably a better bet than arrays, regardless of how well compilers handle those arrays.


If you want something of uniform reliability, rather than uniform performance, you're better off with arrays. Arrays are easier to bounds-check in source form, errors are generally easier to spot, etc.


If you want something that is FAST, then you're probably not wanting to use either. For example, if you've a sequence of characters, and want to copy them, then copying them a character at a time - regardless of how - is slow and inefficient on a 32-bit or 64-bit machine. Cast the data onto the largest unit the CPU can pull out of memory in one operation, and operate in bulk/parallel.

Re:Different methods (1)

Xyrus (755017) | more than 8 years ago | (#13778317)

How the code gets treated depends on how "clear" it is to the compiler. The loop in the example is simple and any decent optimizing compiler worth it's 1's and 0's should optimize it.

But if the code is unclear, let's say multiple nested loops with a good number of branches, local variables, etc. then an optimizing compiler will probably say WTF and just leave it alone.

As has been said many times, profile your code. If you find the bottle-neck, fix it. If it comes down to pointer vs. array, get rid of th array.

"For example, if you've a sequence of characters, and want to copy them, then copying them a character at a time - regardless of how - is slow and inefficient on a 32-bit or 64-bit machine. Cast the data onto the largest unit the CPU can pull out of memory in one operation, and operate in bulk/parallel."

Careful with that. If you're doing cross architecture code you'll have to keep in mind big-endian vs. little-endian.

~X~

Being pedantic perhaps... (1)

SteveAyre (209812) | more than 8 years ago | (#13777590)

Being pedantic perhaps, but you might get a buffer overflow in your code if for some reason it doesn't have a NULL at the end.

Better would be to call the subroutine with (char* str, size_t length) and to replace strlen(str) with str+length.

The subroutine then becomes useful more often too - you can then reverse portions of strings as well as the entire string if you want to.

True in General, not true in reality (1)

gbrandt (113294) | more than 8 years ago | (#13777595)

There are quite a few compilers out there for embedded CPU's and DSP's that are still in the stone age. In embedded programming it is not safe to assume that a compiler will contain any optimization and to write C code that is known to produce faster assembly code.

Thats just a fact of life for people in the trenches.

I suspect that for OS level compilers (GCC, VS) that the opposite is true and you could assume a decent level of optimization is available.

Gregor

Re:True in General, not true in reality (0)

Anonymous Coward | more than 8 years ago | (#13779016)

but since most code will never be compiled on any of those systems the arguement is moot.

target your systems. (that doesnt mean write sloppy and not care about portability)

but basically if you arent running on ancient compilers then dont worry about it

Strength Reduction (3, Informative)

The boojum (70419) | more than 8 years ago | (#13777607)

My argument is that one cannot assume, in a multi-architecture environment, that such optimizations will always be available. Semantically, the expression array[index] must always be expanded to *(array + index) when the index is variable. In other words, the expression cannot be reduced further, because the value of the index is unknown at run time.
Yes, semantically array[index] has to have the same effect as *(array+index). But the compiler is free to generate conceptually equivalent code in any way that it pleases. Any decent C/C++ compiler optimizer that can perform strength reduction ought to be able to see how the index changes the memory location and turn it into simple pointer incrementation accordingly. And strength reduction is a well known optimization that's been around for ages -- if memory serves, even the old Red Dragon book talks about how it works in this context. If your compiler can't handle this you need to find a better compiler.

Do This instead (2, Interesting)

Leimy (6717) | more than 8 years ago | (#13777616)

void reversestring_array(char *str)
{
int head, tail;
char temp;
if (!str) return;
tail = strlen(str) - 1;
for (head = 0; head < tail; ++head, --tail)
{
temp = tail[str];
tail[str] = head[str];
head[str] = temp;
}
}

That'll teach ya.

Re:Do This instead (1)

forkazoo (138186) | more than 8 years ago | (#13777984)

Bah, the original requirement was to do it with as little memory allocation as possible. You still use a temp variable. Let bitwise operations be your friend.

Re:Do This instead (1)

Hast (24833) | more than 8 years ago | (#13778511)

As was discussed in the original thread one point that this excercise miss is that a compiler will naturally put the temporary variables into registers. So the entire XOR thing is cute but will produce slower and less understandable code.

Naturally neither version will actually use any added memory (as in RAM) but I doubt an optimising compiler will be able to perform strength reduction on the XOR operations.

Don't try to out-think your compiler. The compiler is your friend.

Register indexing. (1)

idries (174087) | more than 8 years ago | (#13777620)

This is an interesting post. Much better than all the politics etc. that we've seen recently on ./

I can't think of any modern architectures that don't support register indexing. Can anyone else?

On another note, although at first glance it may appear that the poster is correct I think that a good compiler written for an architecture that didn't support register indexing (M68000 does, so it's going to have to be something very old) would probably be able to figure out what's going on in this code sample. i.e. in the array version within the for loop head and tail are only used as indexes into the array, so converting them into straight pointers would be trivial. Of course in a more complex example YMMV.

Re:Register indexing. (1)

clem.dickey (102292) | more than 8 years ago | (#13778500)

> I can't think of any modern architectures that don't support register indexing

I don't see any register indexing in IA64. No immediate offset, either. If you want to access an address, you'd better have that address - exactly - in a register. At least you get enough registers. :-).

Incidentally, the IBM S/360 was an early machine (perhaps the first) with base+index+offset addressing. I've heard that the three-way adder required was a performance hit. I know that some models had instructions which were faster when the index register field was 0, indicating no indexing.

there is a difference... (1)

simcop2387 (703011) | more than 8 years ago | (#13777638)

The array function:
        mov bl,byte ptr [esi+edx]
        mov al,byte ptr [ecx+edx]
        mov byte ptr [ecx+edx],bl
        mov byte ptr [esi+edx],al

The pointer function:
        mov bl,byte ptr [ecx]
        mov dl,byte ptr [eax]
        mov byte ptr [eax],bl
        mov byte ptr [ecx],dl

if you look closely at these two portions of the assembly, the resulting code is not the same at all, what you in fact have is in the array 4 additional additions (between registers), which on some architecures can be a non-trivial amount of time (though typically not significant, the memory access is usually longer). in this case the pointer function should be at the least slightly faster than the array one. though as everyone else has said, make sure this is truely a bottleneck of your program before you go doing useless optimizations.

P.S.
Also on other architecures (non-x86) this could be a trivial amount of time if they were designed to handle this specifically (two registers added together as a base and offset basically) for memory access.

Re:there is a difference... (1)

SillyNickName4me (760022) | more than 8 years ago | (#13778043)

Got to think of the 6502 and its indirect indexed addressing modes..

Basicly, you'd store a pointer somewhere in memory page 0, and then use the x or y register as an index into whatever it pointed at (ie LDA ($a0),x would take the pointer stored in address a0 and a1, use the x register as an index, and read the contents of the resulting address into the acumulator) . Here it would be faster to manipulate the index and not the pointer.

Pointers were 16 bit (64k address space) and x and y were 8 bit registers.. so when you wanted to address beyodn the boundary of a (256 byte) memory page, you'd have to manipulate the pointer anyway.

Re:there is a difference... (0)

Anonymous Coward | more than 8 years ago | (#13778130)

Yes, and it could be even better with...
if (str)
        {
        for(tail=str;*tail;tail++); // instead of strlen
        for(tail--;tail>str;str++,tail--)
                {
                t=*tail;*tail=*str;*str=t;
                }
        }

Re:there is a difference... (1)

simcop2387 (703011) | more than 8 years ago | (#13778192)

well personally i'd suggest this
<blockquote>*tail^=*str; *str^=*tail; *tail^=*str;</blockquote>
for the swap, as this will be faster on many more architecures. and its more memory effecient.

Re:there is a difference... (0)

Anonymous Coward | more than 8 years ago | (#13778266)

True, and that way only one temporary variable is needed for the function (the tail ptr) as logic analisis indicates.

Re:there is a difference... (1)

UOZaphod (31190) | more than 8 years ago | (#13778467)

Since the original ended up using registers, it couldn't be any more efficient.

Before I even posted it, I tried it with the XOR swap. The code ended up looking very convoluted. I don't think the compiler recognized the XOR swap design.

Re:there is a difference... (4, Informative)

Waffle Iron (339739) | more than 8 years ago | (#13778211)

what you in fact have is in the array 4 additional additions (between registers),

Actually, most x86s have a dedicated address generation units which handle those index additions in parallel on separate logic from the main ALU. So both cases would actually run at the same speed on a modern x86.

Spelling, YAY! (1)

CestusGW (814880) | more than 8 years ago | (#13777666)

Oh my god! You said piqued Not peaked, or piked, or any other dumb thing that editors let slip by in mainstream publications, but piqued! I love you

Spelling and grammar and punctuation, oh my! (4, Funny)

Roadkills-R-Us (122219) | more than 8 years ago | (#13777789)

I didn't see any errors in punctuation or grammar, either. I don't recall the last time I saw a post of that length which didn't confuse plural's (sic) with possessives.

There cannot be any difference. (2, Informative)

Anonymous Coward | more than 8 years ago | (#13777674)

The whole point is dumb.
Every C programmer should know the answer to the following question:

What is 6["abcdefghijkl"]?

Answer: 'g'.

How is this determined?
By definition x[y]=*(x+y)=y[x].

Don't believe me, check the standard.

( Yeah this is a degenerate case, like Duff's device. Still a compiler has to support it. )

Effective cache use will be a better optimisation (1)

WouldIPutMYRealNameO (874377) | more than 8 years ago | (#13777675)

Firstly - who really cares? This kind of optimisation almost never occurs as far as I can tell. Secondly, counting CPU instructions hasn't been accurate for years. Thirdly, for any kind of memory access loop, cache misses will most likely be your biggest performance hit. The best way to actually improve this loop is probably going to be by...
1) if the length of the data is 4k, just do the algorithm
2) otherwise, stride through the data in page-size increments reading 1 32bit word, then do the algorithm

But seriously - either you are working in a system so small and slow that you are already an expert in this stuff, or this kind of optimisation doesn't matter.
Choose quicksort instead of bubble sort.

Re:Effective cache use will be a better optimisati (1)

TheLink (130905) | more than 8 years ago | (#13778440)

Yeah that's what I figured, use a good algorithm (e.g. N vs N^2 ) and as long as the processing loop fits in cache you'll be fine.

And if you can't find/figure out a good algorithm yet, don't do really strange stuff. That way someone later can more easily figure out how to fix your code.

Re:Effective cache use will be a better optimisati (2, Funny)

functor0 (89014) | more than 8 years ago | (#13778494)

I have good story for this one. In my second co-op term, I was asked to improve the speed of the general 2D convolution image filter for a well-known commercial paint package. Nothing complicated, just run the given 2D kernel through the image.

The previous writer of the function had carefully removed all the array accesses, only using pointers that were incremented by fixed constants as it proceeded through the code. He had also carefully maintained an array of the sums of the results as we moved from pixel to pixel to avoid re-calculating anything twice. You would think, it's pretty fast, eh?

I came across two problems with the code:

1. His array of sums was actually a queue and so he shifted the entire array by one element for every single pixel. Using a ring buffer instead quickly solved this one.

2. I then came to realize that he was traversing the row-major image in columns. The cache coherency was being shot to hell because none of the cache-lines were being hit as it went from pixel to pixel. I rewrote the function to go by rows instead and guess what the speed improvement was? Something like 3 times!

Go figure. Some people optimize and get it entirely wrong.

You can... (1)

Bin_jammin (684517) | more than 8 years ago | (#13777713)

always assume anything you want. It won't make it correct though.

In summary... (1)

SmurfButcher Bob (313810) | more than 8 years ago | (#13777882)

I've been reading some really good arguments in these posts about how it'll work. And the bulk of them end up saying, "any good compiler will/should"...

In other words, "No, you cannot."

Your best bet is to do (almost) what you did - compile a test case per platform, check the code - and then the part that you neglected, you need to count the clock ticks per instruction. To reinforce what one poster said - there can be a large difference between [ecx] and [ecx+edx], conceptually speaking. If it truly matters then you count the ticks for each platform, and decide if it is reasonable. 3 ticks versus 2... that's 33%. 2 ticks versus 1... that's 50%.

Re:In summary... (2, Funny)

Anonymous Brave Guy (457657) | more than 8 years ago | (#13778327)

If it truly matters then you count the ticks for each platform, and decide if it is reasonable. 3 ticks versus 2... that's 33%. 2 ticks versus 1... that's 50%.

Not necessarily. You're ignoring the fact that modern processors use pipelining architectures, branch prediction, extensive caching often at several levels, and a whole host of other things that mean the total time required is not the sum of the individual times for each instruction.

One of these days, someone will invent a program that can take some more abstract representation of what we want to do, and automatically generate optimised machine code from it on any given platform. Yeah, there's an idea... I can C it now!

Re:In summary... (1)

SmurfButcher Bob (313810) | more than 8 years ago | (#13778745)

Yes, necessarily.

"When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?" is the quesiton.

"You're ignoring the fact that modern processors use pipelining architectures, " you propose.

As I said, the bulk of the posts end up saying, "any good compiler will/should"...

In other words, "No, you cannot."

But even in the case with chip-level optimizations, your point is still moot; for those processors, then it doesn't make any difference for you - call them all 1 tick. Of course, some code is more pipelinable and can be paralleled better than others... as you well know, so it still matters. That's not his question, anyway. If you read his question carefully, you'll notice the word "ALWAYS" floating around. This is a very strong qualification that you seem to have missed. Last time I checked, 8051s were still in production, not to mention the large pile of other embedded "super-micro" processors out there. Very few support pipelining or prediction, or even caching for that matter. Does my Palm Pilot support branching? How about my PocketPC? Or my generic WinCE device?

"Always." Right?

differences not having to do with optimization (1)

bersl2 (689221) | more than 8 years ago | (#13777935)

int array[8];
int *int_ptr = malloc(8 * sizeof(int));
char string[] = "Hello world!";
char *char_ptr = "Hello world!";

sizeof(array) == 8 * sizeof(int);
sizeof(int_ptr) == sizeof(int *);

You can get int_ptr to point to another location in memory (such as doing ++int_ptr); the similar statement (++array) is illegal.

I think that altering individual characters of char_ptr is undefined (as it would result in changing the value of the literal string), while (as above) string cannot be pointed to a new literal string.

Re:differences not having to do with optimization (0)

Anonymous Coward | more than 8 years ago | (#13778234)

You can alter the chars of char_ptr all you want, it's just a pointer to some bytes (that happen to be ascii chars) on the stack. Of course you don't want to do stupid stuff like kill the null terminator, but nothing stops you from messing with it.
That's why you should be using some const action to keep people from screwing with stuff that shouldn't be touched.

Re:differences not having to do with optimization (1)

Anonymous Brave Guy (457657) | more than 8 years ago | (#13778347)

You can alter the chars of char_ptr all you want, it's just a pointer to some bytes (that happen to be ascii chars) on the stack.

Nope, sorry, he was right and you're wrong. In the code given, char_ptr points to a string literal, which is automatically constant data and can be stored somewhere completely different to either the stack or the heap. Attempting to change that data results in undefined behaviour.

(For completeness: in C++, you can only even point a (non-constant) char * to a string literal because of a deprecated hack in the language spec that allows you to violate the type system, though I'm not sure C99 picked up on that one.)

your mileage will vary, but... (3, Informative)

blackcoot (124938) | more than 8 years ago | (#13778055)

... my experience has been that this matters more in the multidimensional array case than in the single dimensional array case (for those who are curious: my goal is to write algorithms which do non-trivial amounts of processing on VGA or larger video at full frame rates [>= 15Hz], any time i can make array operations faster my entire program benefits significantly). when dealing with two dimensional arrays, you can either do the addressing yourself (location [i,j] in a r x c matrix maps to [i*c+j] in a flat array). if you are clever about how you explain your indexing to the compiler, it should realize that you're passing through consecutive addresses in memory and generate code accordingly. if, on the other hand, you're doing something like A[i][j], the compiler has to generate two deref ops plus pay the cost of whatever cache misses result from using the two levels of indirection --- in this case, working with pointer / index arithmetic relative to the base address is a big win.

have you tried this with intel's c/c++ compiler or other compilers? i'd be curious to see if what you're seeing is a result of how gcc is limited in the number of optimizations it can apply directly to the parse tree because it can't assume (at that stage) a particular target machine.

Use Java instead .... (2, Funny)

icepick72 (834363) | more than 8 years ago | (#13778076)

... and then you will no longer have to worry that it might be slow.

GCC experimental results (5, Interesting)

RML (135014) | more than 8 years ago | (#13778091)

Just for fun, I tried the sample code on gcc (GCC) 4.1.0 20050723 (experimental), with -O3 -march=pentium-m. The loop from the array version:

L13:
movzbl  -1(%ebx), %edx
movl    %esi, %ecx
decl    %edi
movl    8(%ebp), %eax
movb    %dl, -13(%ebp)
movzbl  -1(%esi,%eax), %edx
movb    %dl, -1(%ebx)
decl    %ebx
movzbl  -13(%ebp), %edx
movb    %dl, -1(%esi,%eax)
incl    %esi
cmpl    %ecx, %edi
jg      L13

The loop from the pointer version:

L5:
movzbl  1(%esi), %edx
movl    %esi, %ecx
movzbl  (%ebx), %eax
movb    %al, 1(%esi)
decl    %esi
movb    %dl, (%ebx)
incl    %ebx
cmpl    %ecx, %ebx
jb      L5

Time to execute the array version 100,000 times on a 10,000 character string: 0m4.515s
Time to execute the pointer version 100,000 times on a 10,000 character string: 0m3.936s

So the pointer version actually generates somewhat faster code with the compiler I used on this example, which surprises me. But there's no substitute for actually testing.

Re:GCC experimental results (2, Interesting)

Anonymous Brave Guy (457657) | more than 8 years ago | (#13778368)

GCC is designed to be portable, not fast, so the code is generates is often pretty bad compared to specialised, platform-specific compilers. Obviously your test is relevant if GCC is the compiler you'll be using, but for serious performance work it's pretty much irrelevant what GCC generates because no-one uses it when native alternatives are available anyway. In fact, your example code here is a great demonstration of this!

Re:GCC experimental results (2, Insightful)

UOZaphod (31190) | more than 8 years ago | (#13778488)

What is used to compile the Linux Kernel?

Re:GCC experimental results (1)

yarbo (626329) | more than 8 years ago | (#13778698)

gcc, icc, or tcc. The majority of people probably use gcc, but tcc and icc are definitely possibilities.

Re:GCC experimental results (1)

Nelson (1275) | more than 8 years ago | (#13778940)

Use an actual release of the compiler and report this regression to the gcc mailing list. It's an error.

And C++/STL? (4, Informative)

XenonOfArcticus (53312) | more than 8 years ago | (#13778103)

And what of STL container classes under C++?

Seriously though, there is no generalized answer. Good compilers will do what you want. Bad compilers (and there are more than you realize out there) will make lousy code.

If your target involves an environment where you might be using a more primitive compiler, or you can't predict the environment and compiler, it might be an issue. This is why code like the PNG and JPEG libs go for tight/cryptic. As well, the performance of the runtime platform (CPU, memory, bus) have bearing. In some cases, though one piece of assembly might look less efficient than the other, the sheer brute force of CPU parallelism, out of order execution and other black juju might render it meaningless.

Finally, you have to consider the cost/benefit of the situtation. Making cryptic fast code is worthwhile if you're writing some wicked FFT code or image processor main loop, where it'll run a few quadrillion times. Other places in the same codebase though, it's probably worthwhile to trade absolute performance for a bit better code readability and maintainability.

Remember, profile before optimizing. Only optimize things that will really make a significant performance difference. Rewriting the UI display loop to use pointers instead of lists is probably pointless. Heh. Pointless.

I'm a big fan of C++ STL containers now. If I _know_ a block of code is going to be a critical bottleneck, I'll use something else from the start. I've known people who coded UIs in assembly, for no good reason, and others who wrote image processing code in interpreted RAD scripting environments. I've written and optimized code (C++/C and assembly) on systems all the way back to the 6502 (yay! two 8-bit index registers!) and it's not hard, as long as you proceed sensibly and logically.

That being said, the Microsoft VC++6 compiler (still in common use today) has a terrible code generator. It fails to perform simple loop invariant hoisting operations that my old SAS/C++ compiler (Amiga, yah, I'm one of _them_...) did in 1990. VC++7/2003 and Whidbey/2005 are showing signs of being MUCH more caught-up, and the Intel and Codeplay compilers (despite Intel's AMD-phobia) are much better too.

When performance really counts, a whole new set of tools and processes come into play.

use array (1)

dave1g (680091) | more than 8 years ago | (#13778137)

If you use arrays the compiler these days will ocnvert to pointer arithmetic. However it also now knows that it is an array and can make assumptions not available to it before allowing more code order optimizations in loops.

These days if its something stupid like using one syntax over another, that optimization has been built into the compiler. The only real speed enhancements are going to be algorithmic. You know the old thigns that make sense, take stuff out of loops that dont need to be inside, compiler will do that so dont even bother making the change unless it increases readability. blah blah blah...

TFA is a Troll (1)

mugnyte (203225) | more than 8 years ago | (#13778138)


If you answer this question, you fuel a discussion that should be constrained to a very very small domain of C development. Otherwise, you're going to see needless flag waving about coding standards in domains where it doesn't need to exist. Don't feed the dirty C programmer (we like other info, really, we do).

and frankly, the TFA depends on architecture, where most modern compilers make both snippets equal.

Re:TFA is a Troll (1)

UOZaphod (31190) | more than 8 years ago | (#13778801)

I submitted the article because I am genuinely interested in the idea (which opposes my original thinking) that I can get away with using arrays and not take a performance hit.

I am happy if it is true, because using arrays results in code that is easier to maintain and read.

When submitting the article, I made it a point to avoid using inflamatory language, and went out of my way to present both sides of the issue, even going as far as making a case for the opposing view.

In fact, when I brought the idea up the first time in the other thread, my own posts were met with a harsh remark about why I was wrong. That comment, while not as friendly as I would prefer, is what got me interested in the idea. I posted to Ask Slashdot because I wanted to hear some more level-headed opinions on the matter that didn't include inflamatory language.

From the replies I've seen so far, I can see there are indeed some critical thinkers reading this site who are nice enough to respond in kind.

Correctness trumps illusory speed improvement (0)

Anonymous Coward | more than 8 years ago | (#13778249)

Your array vs pointer question is interesting from a navel-gazing perspective, but of no importance in practice. In practice, the overriding concern is readability and correctness. Performance is only important if benchmarking shows that the routine is a bottleneck.

In fact, your pointer example has a subtle bug: if strlen(str) == 0 then you can have ptail = str - 1 which is strictly illegal (you cannot point to a location outside of a memory object except for exactly one element past the end, which is then illegal to dereference). I know that no modern systems would fail with this code, but I expect it would screw up on some memory models with old MS-DOS C compilers.

In short, you should always write the simplest, most readable code, then optimise later if there is a problem. Both array indexing and pointer versions can be written well, so there is no reason to prefer either. By the way, I would mark you down for using !ptr when ptr == NULL is the more readable idiom.

As Donald Knuth said... (4, Insightful)

jschmerge (228731) | more than 8 years ago | (#13778441)

Premature optimization is the root of all evil.

I personally let the compiler writers worry about this type of thing. I'd rather have my code be more readable than fast. That being said, there are many times that I switch back and forth between pointer arithmetic and array indexing within the same program. I'll primarily use the pointer arithmetic for things like simple string processing where it just leads to more compact code, and I'll use the array indexing when I have an actual bonafide array...

In any event, my point is that you should be programming in a way that is maintainable and readable; you shouldn't be worried about shaving tens of clock cycle off of such simple loops. In more complex loops, you probably will not be able to shave nearly as much time off, because your indexes won't always be sitting in a register and the data that the index points to will most likely not even fit into a register. In this case, I don't think anyone will argue with the assertion that this:

(ptr + index)->dataMember

is less readable than this:

ptr[index].dataMember

Pointers are for programmers silly rabbit!! (2, Insightful)

Da_Weasel (458921) | more than 8 years ago | (#13778477)

Pointers are for programmers, not for speed. A particular compilers implementation of the C specification may produce better or more efficient code when using pointers, but i seriously doubt that every implementation would work the same or even across different architectures.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?