Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Don't Overlook Efficient C/C++ Cmd Line Processing

CmdrTaco posted more than 7 years ago | from the also-don't-eat-yellow-snow dept.

Programming 219

An anonymous reader writes "Command-line processing is historically one of the most ignored areas in software development. Just about any relatively complicated software has dozens of available command-line options. The GNU tool gperf is a "perfect" hash function that, for a given set of user-provided strings, generates C/C++ code for a hash table, a hash function, and a lookup function. This article provides a reference for a good discussion on how to use gperf for effective command-line processing in your C/C++ code."

cancel ×

219 comments

Sorry! There are no comments related to the filter you selected.

i tooted (-1, Offtopic)

Anonymous Coward | more than 7 years ago | (#20032233)

~8 i r dobernala geeklord terd on my hunnies they heart this! *l*

Re:i tooted (-1, Offtopic)

Anonymous Coward | more than 7 years ago | (#20032255)

i tooted
Beans, beans, the musical fruit. The more you eat, the more you toot.

Speed in options parsing? (5, Insightful)

tot (30740) | more than 7 years ago | (#20032263)

I would not consider speed of command line option processing to be bottleneck in any application, the overhead of starting of the program is far greater.

Re:Speed in options parsing? (3, Insightful)

ScrewMaster (602015) | more than 7 years ago | (#20032279)

I'd say the speed of human motor activity is an even greater limiting factor.

Re:Speed in options parsing? (3, Informative)

pete-classic (75983) | more than 7 years ago | (#20032359)

What a limited point of view. See "man system", for example.

-Peter

Re:Speed in options parsing? (2, Insightful)

Anonymous Coward | more than 7 years ago | (#20032291)

It's still handy to have a fairly comfortable way of generating code that does things needed every time (or at least very, very often) in an easily applicable and very optimized way. I like it.

Re:Speed in options parsing? (4, Informative)

ChronosWS (706209) | more than 7 years ago | (#20032301)

Indeed, what the hell? Now you have to have another tool and another source file for what is essentially declaring a dictionary in C++, which should be in any good developer's library? Yeesh.

If you don't like the nasty nested ifs, make the keys in your dictionary the command line options and the values delegates, then just loop through your list of options passed on the command-line, invoking the delegate as appropriate. Eliminates the if, there are no switch statements either, and each of your command line arguments is now handled by a function dedicated to it, bringing all of the benefits of compartmentalizing your code rather than stringing it out in a huge processing function.

Broken handling of vtables in linkers (4, Informative)

tepples (727027) | more than 7 years ago | (#20032439)

Now you have to have another tool and another source file for what is essentially declaring a dictionary in C++, which should be in any good developer's library?
Due to the brokenness of how some linkers handle virtual method lookup tables, using anything from the C++ standard library tends to bring in a large chunk of dead code [wikipedia.org] from the standard library. I compiled hello-iostream.cpp using MinGW and the executable was over 200 KiB after running strip, compared to the 6 KiB executable produced from hello-cstdio.cpp. Sometimes NIH syndrome produces runtime efficiency, and on a handheld system, efficiency can mean the difference between fitting your app into widely deployed hardware and having to build custom, much more expensive hardware.

Re:Broken handling of vtables in linkers (-1, Flamebait)

Anonymous Coward | more than 7 years ago | (#20032517)

I compiled hello-iostream.cpp using MinGW and the executable was over 200 KiB after running strip, compared to the 6 KiB executable produced from hello-cstdio.cpp.
HOLY SHIT! 194KB BIGGER?! HOW WILL YOU EVER FIND THE SPACE FOR SUCH A HUGE EXECUTABLE?!?!

All the world is not a PC (5, Insightful)

tepples (727027) | more than 7 years ago | (#20032623)

HOLY SHIT! 194KB BIGGER?! HOW WILL YOU EVER FIND THE SPACE FOR SUCH A HUGE EXECUTABLE?!?!
I develop for a battery-powered computer with 384 KiB of RAM. In such an environment, what you appear to sarcastically call a "mere couple hundred kilobytes" is a bigger deal than it is on a personal computer manufactured in 2007.

Re:All the world is not a PC (1)

sholden (12227) | more than 7 years ago | (#20032823)

And you do so using MinGW and c++?

devkitARM (3, Informative)

tepples (727027) | more than 7 years ago | (#20032903)

And you do so using MinGW and c++?
Yes, I do so with devkitARM [devkitpro.org] (a cross-compiling GCC toolchain that is itself compiled with MinGW) and C++.

Re:devkitARM (1)

sholden (12227) | more than 7 years ago | (#20033067)

What the toolkit is compiled with is irrelevant. You're not using it unless you are compiling code targeted to MS Windows, which I don't think you are. Doing the iostream versus stdio hello world on local gcc gives a difference of 496 bytes hence my guess that the way MinGW links libraries might be the reason for the bloat. And since MinGW targets win32, bloat is simply not an issue.

Byte counts when compiled with devkitARM (1)

tepples (727027) | more than 7 years ago | (#20033251)

What the toolkit is compiled with is irrelevant. You're not using it unless you are compiling code targeted to MS Windows, which I don't think you are.
I knew that. But I have generally seen overheads of the same magnitude when using standard C++ libraries on devkitARM as on MinGW. I just tried it on the GBA: 5,156 bytes for hello-world.mb, which just pushes a C string straight into agbtty_puts(), and 253,652 bytes for hello++.mb, which pushes output through a std::ostringstream and then into agbtty_puts(). (The limit for a .mb executable is 262,144 bytes, as the other 128 KiB of RAM in the system is specialized.)

Doing the iostream versus stdio hello world on local gcc gives a difference of 496 bytes
What "local" platform are you talking about? Does it use a dynamically linked C++ standard library?

Re:All the world is not a PC (0, Flamebait)

Urusai (865560) | more than 7 years ago | (#20033049)

I'd say a bigger deal is your pretentious use of kibibytes (KiB).

Re:All the world is not a PC (1, Interesting)

Anonymous Coward | more than 7 years ago | (#20033165)

"I develop for a battery-powered computer with 384 KiB of RAM. In such an environment, what you appear to sarcastically call a "mere couple hundred kilobytes" is a bigger deal than it is on a personal computer manufactured in 2007."

I fail to see how is this strong argument in this discussion. How many of these embedded tools you write actually _do_ command line processing? If they do, why don't you invest in more (both memory- and time-) efficient ways to do IPC than the command line?

Character encoding conversion (2, Informative)

tepples (727027) | more than 7 years ago | (#20033723)

How many of these embedded tools you write actually _do_ command line processing?
None yet, but they do handle other things that involve dictionaries, such as character encoding conversion. A program designed to move items back and forth between a town in Animal Crossing (for Nintendo GameCube) and a town in Animal Crossing: Wild World (for Nintendo DS) needs to be able to understand the encodings of character names and town names that these games use, possibly by converting between their proprietary 8-bit codecs and UTF-8.

why don't you invest in more (both memory- and time-) efficient ways to do IPC than the command line?
Because the command line, pipes, and sockets are the most obvious ways for two programs to communicate if their copyright licenses prohibit them from being linked together into one executable.

only relevent to static linking (4, Informative)

sentientbrendan (316150) | more than 7 years ago | (#20033479)

It sounds like the author is statically linking his library and running on embedded an embedded system. It is not surprising in that case that the c++ standard library brings in much more code than the c standard library, but it should be made clear that it is not relevant to desktop developers, pretty much all of which dynamically link with glibc.

Again, to be clear, dynamically linking with the c++ standard library is not going to increase your executable size. Please don't try to roll your own code that exists in the standard library. It is a real nuisance when people do that.

I should qualify that by saying that template instantiations do (of course) increase executable size, but that they do so no more than if you had rolled your own.

Which platform uses dynamic libstdc++? (2, Insightful)

tepples (727027) | more than 7 years ago | (#20033817)

It is not surprising in that case that the c++ standard library brings in much more code than the c standard library, but it should be made clear that it is not relevant to desktop developers, pretty much all of which dynamically link with glibc.
On MinGW, the port of GCC to Windows OS, my programs dynamically link with msvcrt, not glibc. Also on MinGW, libstdc++ is static, just like in the embedded toolchain. Are you implying that one of the C++ toolchains for Windows uses a dynamic libstdc++? Which toolchain for which operating system that is widely deployed on home desktop computers are you talking about?

Re:Speed in options parsing? (1)

hxnwix (652290) | more than 7 years ago | (#20032403)

Except on Windows XP, where pipe performance degraded an order of magnitude as compared to Windows 2000.

Re:Speed in options parsing? (5, Funny)

Anonymous Coward | more than 7 years ago | (#20032473)

You're not a real programmer if you won't over optimize unrelevant parts of your code.

Re:Speed in options parsing? (4, Funny)

Maniac-X (825402) | more than 7 years ago | (#20033515)

Klingon function calls do not have 'parameters' - they have 'arguments.' AND THEY ALWAYS WIN THEM!

Re:Speed in options parsing? (3, Insightful)

canuck57 (662392) | more than 7 years ago | (#20032479)

I would not consider speed of command line option processing to be bottleneck in any application, the overhead of starting of the program is far greater.

Your just experiencing this with Java, Perl or some other high overhead bloated program. People often pull out a heavy weight needing a 90MB VM or a 5-10MB basis library calling the cats breakfast of shared libraries I would agree, but lets take a look at C based awk for example, it is only a 80kb draw. Runs fast, nice and general purpose and does a good job of what it was designed to do. It can be pipelined in, out and used directly on the command line as it has proper support for stdin, sndout and stderr. On my system, only 10 disk blocks to load.

While fewer people are proficient at it, C/C++ will outlast us all for a language. Virtually every commodity computer today uses it in it's core. Many others have come and gone yet all our OSes and scripting tools rely on it. So any dooms day predictions would be premature, and if your want fast, efficient and lean code you do C/C++....

Re:Speed in options parsing? (0, Insightful)

Anonymous Coward | more than 7 years ago | (#20032573)

C/C++ will outlast us all for a language.
There's no such language as C/C++.

Re:Speed in options parsing? (2, Insightful)

Anonymous Coward | more than 7 years ago | (#20032485)

Indeed. The applications of perfect hashing (and minimal perfect hashing) are quite limited. Basically it only makes sense if you need to quickly identify strings from a fixed, finite set of strings known at compile time. And as with all optimizations, only if that part of your program is a bottle neck or you are prepared to optimize all other aspects of your program as well.

The traditional example application for perfect hashing was identifying keyword tokens when building a compiler, but for complex modern languages like C++ parsing source code is just a very tiny fraction of the compilation process. And even that scenario makes more sense than parsing command line options.

I doubt there is a single application that significantly benefits from hashed lookup of command line options. Suggesting that it makes sense to spend your time increasing the complexity of your application for a practically immeasurable improvement in performance is insanity.

Re:Speed in options parsing? (1)

eokyere (685783) | more than 7 years ago | (#20032577)

as somebody already mentioned, speed in options parsing pretty useless and I could use commons-cli (in Java) and the groovy CliBuilder for cmdline options that arguably look cleaner and more accessible to a lot more ppl def cli = new CliBuilder(usage: "foo [args] baz") cli.i(argName:"path", longOpt:"input", args:1, required:false, "src") cli.o(argName:"path", longOpt:"output", args:1, required:false, "dest") cli.h(longOpt:"help", "this message") def options = cli.parse(args) if (!options || !options.i || options.h) { println "foobaz ver 0.0.1" cli.usage() return } // rest of code

Re:Speed in options parsing? (4, Funny)

ai3 (916858) | more than 7 years ago | (#20032857)

You must not have seen the recent proposal for GNU tools options, which will require four dashes instead of two and a minimum of four words per option. Under a UN/EU funded program to ease the transition to intelligent machines, developers are rewarded for implementing full-sentence options and/or prose. But initial experiments showed that many users where unwilling to wait for the parsing of the command "remove-files --recursively-from-root-directory --do-not-ask-for-confirmation-just-delete --i-really-want-this!" just to be 1337, which led to whatever development efforts are mentioned in the article, which I didn't read.

Re:Speed in options parsing? (1, Funny)

Anonymous Coward | more than 7 years ago | (#20032941)

Yes, this will help maintainability but a consistent and standardized naming convention is helpful too.

function ifYouThoughtYouWereGoingToEditThisFileUsingATermin alBasedEditorThenYouNeedToThinkAgainOhYeahIIndentU singTabsToo()
{
/* */
}

Too much (3, Insightful)

bytesex (112972) | more than 7 years ago | (#20032269)

I'm not sure that for the usually simple task of command line processing, I'd like to learn a whole new lex/yacc syntax thingy.

Joke? (0)

Anonymous Coward | more than 7 years ago | (#20032277)

This has to be a joke? Sheesh. Someone found a "new" toy?

Re:Joke? (4, Insightful)

iangoldby (552781) | more than 7 years ago | (#20032539)

Someone found a "new" toy?
Well I for one won't be using this to process command-line arguments (that's what getopt() and getopt_long() are for), but it is certainly useful to know of a tool that I can use to generate a perfect hash. The next time I need some simple but efficient code to quickly discriminate between a fixed set of strings, I'll know to Google for gperf. (Before I read this article I didn't even know it existed.)

Re:Joke? (1)

bumby (589283) | more than 7 years ago | (#20033121)

this was actually what I first thought it was to be used for, before I read the comments. I thought the preview was about how complicated commandline tools were to use with all there options, and gpref was an example of such a program.

Maybe overkill BUT... (1)

Derek Loev (1050412) | more than 7 years ago | (#20032319)

it does create some good-looking code.

C++ I get (-1, Troll)

WED Fan (911325) | more than 7 years ago | (#20032323)

O.k., I get C++. It's still viable, but C? Since I moved to C++ way, way, way back when, I haven't had much use for C, hell, I've even moved away from C++. So, I haven't tracked projects or systems that still use C. Aside from those still doing C on ATMELS and PIC, who is using it?

Re:C++ I get (4, Insightful)

Anonymous Coward | more than 7 years ago | (#20032393)

I do. On MIPS, ARM, PPC, x86, and all the other embedded stuff. I don't think C will ever die - it's the universal assembler language.

Re:C++ I get (5, Funny)

V. Mole (9567) | more than 7 years ago | (#20032419)

There's this little project of which you may have heard: http://www.kernel.org/ [kernel.org]

Re:C++ I get (0)

Goalie_Ca (584234) | more than 7 years ago | (#20033107)

It only uses a 'subset' of c++ called c ;)

Re:C++ I get (1, Interesting)

hxnwix (652290) | more than 7 years ago | (#20032449)

Oh, gee, well, nobody except:

1) Every linux kernel developer
2) Every *BSD kernel developer
3) John Carmack, for the core of every ID engine up to and possibly beyond Doom3
4) You, whenever you compile C++ code, as it is compiled to C before machine code (unless you are using an exotic compiler such as the Compaq AXP C++ compiler for TRU64).

Re:C++ I get (4, Interesting)

mce (509) | more than 7 years ago | (#20032567)

You, whenever you compile C++ code, as it is compiled to C before machine code (unless you are using an exotic compiler such as the Compaq AXP C++ compiler for TRU64).

Excuse me???? That was not even true anymore when I started using C++, back in 1992. There are features in the C++ standard that are so extremely difficult to correctly implement in standard compliant C that it's a complete waste of effort trying to pass via C while compiling. Exception handling comes to mind as the prime example. A failed attempt to support exceptions was the reason why Cfront 4.0 was abandoned. Note that 3.0 was released as early as 1991. The last Cfront based compiler I had the horor of using was HP's CC. It was superseeded by the new native aCC by 1994 at the latest.

By the way, I used to write C/C++ compilation/optimisation stuff for a living, so I guess I know something about the topic.... :-)

Re:C++ I get (1)

hxnwix (652290) | more than 7 years ago | (#20032815)

I used to write C/C++ compilation/optimisation stuff for a living, so I guess I know something about the topic....
Good guess. [wikipedia.org] Name decoration and limited knowledge of c++'s origins led me to conclude that most C++ compilers still act as front ends. So, we don't all use C anymore...

Re:C++ I get (1)

StripedCow (776465) | more than 7 years ago | (#20032909)

C is indeed not a good intermediate language for the reasons you mentioned.
But C-- may be (http://cminusminus.org/)
Perhaps the kernel developers should be coding in *that* language :-)

Re:C++ I get (0)

Anonymous Coward | more than 7 years ago | (#20033029)

At least the Comeau C++ compiler [comeaucomputing.com] still generates C code, and is known as one of the most portable and standard-compliant C++ compilers (including support for exported templates!). So compiling C++ to C is definitely a viable strategy (although I can understand compiler vendors that want to offer a complete toolchain take a different approach).

Re:C++ I get (1)

pclminion (145572) | more than 7 years ago | (#20033527)

There are features in the C++ standard that are so extremely difficult to correctly implement in standard compliant C that it's a complete waste of effort trying to pass via C while compiling.

The only thing I can imagine that would be hard to map directly onto C would be exceptions. Can you confirm that this is what you mean? Because nothing else comes to mind that would be "extremely difficult" to implement.

Even then, it's possible to emulate C++-style exceptions in C. I've done it -- the best description I can think of is "horrifically ugly." But it's possible.

Re:C++ I get (3, Informative)

mce (509) | more than 7 years ago | (#20033809)

Of course C++ exceptions are what I meant. What else would I mean when using the word "exceptions" in this context?

And yes, C++ exceptions can be expressed in C. After all, C is a glorified assembler and the resulting code from C++ translation is assembler as well. It all depends in the level of abstraction at which write the C code is written and on the amount of uglyness/inefficiency you're willing to take on board (and also the trade-off between both of the latter). But that's not the point. The point of this thread is that nowadays it makes no sense to make use of this capability in a C++ compiler. Especially not when considering that a user of a C++ compiler wants more than just a compiler. He also wants a debugger that is able to meaningfully link up the binary and the original C++ source. If you're a C++ compiler vendor, using C as an IL does nothing but complicate your own life. Twice.

Re:C++ I get (4, Informative)

Enselic (933809) | more than 7 years ago | (#20032619)

You are wrong about 3):

The process of building the new engine went much more smoothly than anything we have done before, because I was able to do all the groundwork while the rest of the company worked on TeamArena. By the time they were ready to work on it, things were basically functional. I did most of the early development work with a gutted version of Quake 3, which let me write a brand new renderer without having to rewrite file access code, console code, and all the other subsystems that make up a game. After the renderer was functional and the other programmers came off of TA and Wolf, the rest of the codebase got rewritten. Especially after our move to C++, there is very little code remaining from the Q3 codebase at this point.

Source: http://archive.gamespy.com/e32002/pc/carmack/ [gamespy.com]


And 4) as well:

Historically, compilers for many languages, including C++ and Fortran, have been implemented as "preprocessors" which emit another high level language such as C. None of the compilers included in GCC are implemented this way; they all generate machine code directly. This sort of preprocessor should not be confused with the C preprocessor, which is an integral feature of the C, C++, Objective-C and Objective-C++ languages.

Source: http://gcc.gnu.org/onlinedocs/gcc-4.2.1/gcc/G_002b _002b-and-GCC.html [gnu.org]

Re:C++ I get (1)

mechsoph (716782) | more than 7 years ago | (#20032643)

You, whenever you compile C++ code, as it is compiled to C before machine code (unless you are using an exotic compiler such as the Compaq AXP C++ compiler for TRU64)

GCC parses C++ to it's tree IR; there is no translation to C.

Wrong about 4 (or at least, very out of date) (1)

jdennett (157516) | more than 7 years ago | (#20032747)

It's been many years since most C++ compilers used C as an intermediate language. CFront did, and some EDG-based compilers do, but most current C++ compilers do not.

C does have its strengths, such as the relative simplicity of C90 and its lack of dependency on sophisticated compilers and runtimes, but its use as an IL is largely historical.

Re:C++ I get (1)

DirtySouthAfrican (984664) | more than 7 years ago | (#20032919)

I don't think C++ compilers compile to C anymore... I know Borland's TPC did this, but that was back when C++ was built on top of C.

Re:C++ I get (1)

PerlDudeXL (456021) | more than 7 years ago | (#20033187)

You, whenever you compile C++ code, as it is compiled to C before machine code

One of my Computer Science Profs said something similar. He argued that C and C++
are basically the same outdated shit and professionals would only use Java in real-world
applications. The best thing: He ran Ubuntu and all sorts of Gnome stuff on his Laptop.

Re:C++ I get (2, Insightful)

iangoldby (552781) | more than 7 years ago | (#20032483)

I use C for any low-level programming project that doesn't warrent an object-oriented approach.

The trick is to identify the best tool for the job.

I'm doing it. (1)

www.sorehands.com (142825) | more than 7 years ago | (#20032503)

I'm currently rewriting Post Road Mailer, which is in C on OS/2. I also wrote a e-mail scanner. It all depends on what you need to do.

I did a phone interview for a job a couple of years ago. Remote underwater sensor equiptment. Had to run on battery, you think they would have written in in C or C++? It would once in a while turn on the hard drive one the flash drive was full.

The more you abstract something, the less efficient it becomes.

There are millions of lines of COBOL code still running.

"The Jenolan could probably fly rings around the Enterprise on impulse." Geordi LaForge.

Re:I'm doing it. (1)

DreadSpoon (653424) | more than 7 years ago | (#20032725)

The more you abstract something, the less efficient it becomes.
This is not at all true, especially not today. I'd trust an abstract container library to optimize its internals far more than I'd trust you or almost any other individual developer to do the same.

I trust my C compiler to get the vary many high-level optimizations required by today's CPUs right than I'd trust you or almost any other individual developer to do the same.

Yeah, sometimes those high level libraries or languages get things wrong, but that's not a given just because they're more abstract. It's merely an implementation bug.

If you had some C code that was being inefficiently compared to assembler code, then you just don't know how to write efficient C code or you were using a shit compiler.

I disagree (1)

www.sorehands.com (142825) | more than 7 years ago | (#20033389)

While I agree that most modern compilers can out optimize the average programmer, you are still looking at generalities.

Both compilers and abstract container class have to deal with generalities which may not apply to YOUR specific case. The class writer does not know the specific case or conditions (presuming you are not writing the class for that specific condition). A class writer has to (or should be) check arguments and conditions, where if you know it has been checked (and am damn well sure) you can skip that.

When writing an abstration layer, you are adding a layer.

On the other hand, a good programmer would not try to optimize a bubble sort. I was working on a resource compiler (in DOS) back in 1989. It would take 45 minutes to 'compile'. I rewrote it to take about 3:15 minutes. But during the writing my AVL btree insert was taking forever. A would allocate the memory to do the insert, when it found the word was in there, it would free it. Deferring the allocation fixed that.

If you know the entire program/system you can better optimize.

Re:C++ I get (1, Interesting)

Anonymous Coward | more than 7 years ago | (#20032553)

Re:C++ I get (1)

AuMatar (183847) | more than 7 years ago | (#20032625)

Pretty much every embedded program in existence. Own a printer? Thats several hundred thousand lines of C in there.

Re:C++ I get (0)

Anonymous Coward | more than 7 years ago | (#20032667)

thing is, 'the world' is built an C/C++ and this won't change soon, everywhere you look it's C/C++ libs and stuff.
i'm trying desperately to move away from C++, it's dusty and a hell of a language with loads of problems BUT the average neighborhood library has a C (or C++) interface. So either you fiddle around with more or less weird X to C call libraries, or you stick with C/C++. sad but that's the way it is ;/

Don't do any embedded development, do ya? (1)

Anonymous Meoward (665631) | more than 7 years ago | (#20033541)

In the embedded realm (not to mention kernel or driver space stuff for any OS), you won't be using much C++. Granted, I've used both in the embedded world, and I prefer C++ whenever I can get away with it. But that ain't often.

One of the problems with C++ in the embedded market is not the language itself, but the mindset of the developers. Most folks who do low-level stuff are not as concerned with code structure and organization as they are the size and speed of the generated code. (Don't believe that? Try working under a tight schedule.) Many of them abhor C++ for its complexity, and more than a few in my experience also don't have enough experience with C++ to use it effectively anyway.

For example, when I worked on a platform that had to be up 24/7 (this wasn't something you'd buy from Best Buy, 'kay?), some enterprising soul tried his hand at C++ and put the following statement in a constructor:

delete this;

Brrr.

Not much C++ occurred in the organization after that one sneaked in.

Yeah, because getopt(3) is a real bottleneck (4, Insightful)

V. Mole (9567) | more than 7 years ago | (#20032331)

Does the phrase "reinvent the wheel" strike a chord with anyone?

It is if the linker complains about not finding it (4, Informative)

tepples (727027) | more than 7 years ago | (#20032591)

Yeah, because getopt(3) is a real bottleneck
getopt() is in the header <unistd.h>, which is in POSIX, not ANSI. POSIX facilities are not guaranteed to be present on W*nd?ws systems. It also handles only short options, not long options. For those, you have to use getopt_long() of <getopt.h>, which isn't even in POSIX.

Does the phrase "reinvent the wheel" strike a chord with anyone?
If the wheel isn't licensed appropriately, copyright law requires you to reinvent it. Specifically, using software under the GNU Lesser General Public License [gnu.org] in a proprietary program intended to run on a platform whose executables are ordinarily statically linked, such as a handheld or otherwise embedded system, is cumbersome.

MOD UP (0, Redundant)

ipjohnson (580042) | more than 7 years ago | (#20032865)

Wish I had some mod points, great reply.

Re:It is if the linker complains about not finding (2, Interesting)

tqbf (59350) | more than 7 years ago | (#20033011)

Are you seriously trying to argue that gperf is more portable than getopt?

Re:It is if the linker complains about not finding (1)

tepples (727027) | more than 7 years ago | (#20033075)

Are you seriously trying to argue that gperf is more portable than getopt?
I'm not arguing specifically in favor of gperf, but arguing generally that reinventing the standard library has its justifications at times.

Re:It is if the linker complains about not finding (1)

larry bagina (561269) | more than 7 years ago | (#20033357)

Here's something you've all been waiting for: the AT&T public domain source for getopt(3). It is the code which was given out at the 1985 UNIFORUM conference in Dallas. I obtained it by electronic mail directly from AT&T. The people there assure me that it is indeed in the public domain. [google.com]

There is no manual page. That is because the one they gave out at UNIFORUM was slightly different from the current System V Release 2 manual page. The difference apparently involved a note about the famous rules 5 and 6, recommending using white space between an option and its first argument, and not grouping options that have arguments. Getopt itself is currently lenient about both of these things White space is allowed, but not mandatory, and the last option in a group can have an argument. That particular version of the man page evidently has no official existence, and my source at AT&T did not send a copy. The current SVR2 man page reflects the actual behavor of this getopt. However, I am not about to post a copy of anything licensed by AT&T.

I will submit this source to Berkeley as a bug fix.

I, personally, make no claims or guarantees of any kind about the following source. I did compile it to get some confidence that it arrived whole, but beyond that you're on your own.

Re:It is if the linker complains about not finding (0)

Anonymous Coward | more than 7 years ago | (#20033471)

on W*nd?ws systems
Watchawackabindows?
Wackinteluntilandows?
Winapackatindows?
Please! Expand that wildstar! Whatever could it match?

And the standard says... (5, Insightful)

Anonymous Coward | more than 7 years ago | (#20032399)

Good grief. What a strawman of an example.
Anyone writing or maintaining command line programs knows that they
should be using the API getopt() or getopt_long().
There are standards on how command line options and arguments are to be
processed. They should be followed for portability and code maintenance.

Re:And the standard says... (0)

iangoldby (552781) | more than 7 years ago | (#20032491)

Anyone writing or maintaining command line programs knows that they
should be using the API getopt() or getopt_long()...
Someone please mod parent up (not this).

I agree... (2, Insightful)

SuperKendall (25149) | more than 7 years ago | (#20032603)

There's a time and place for gperf - command line argumnet processing is not it!

Actually, I've never really come across a case where I knew ahead of time the whole universe of strings I would be accepting, and so never ended up using it - gperf is a great idea, but this seems to be a case of someone really looking hard to figure out where they could shoehorn gperf into just for the sake of using it.

Re:I agree... (1)

thogard (43403) | more than 7 years ago | (#20033197)

This whole discussion reminds me of the often quoted phrase "Premature optimisation is the root of all evil" but you bring up an interesting point that I disagree with.
There is a place for gperf in command line processing, its just not for production programs. It is fine for experimental programs as a training exercise.

Re:And the standard says... (1)

The Vulture (248871) | more than 7 years ago | (#20032641)

From what I can see in the article, it's not meant to replace getopt/getopt_long.

I am currently writing an application (for my employer) where this may be useful. Although it also uses command line parameters (via getopt_long), it also receives commands in ASCII over a network connection - that is what I believe this article targets.

Because the commands I receive can have almost any series of parameters in any sequence however, I prefer to do what another poster here already stated - you look for keywords in a lookup table, and then call a function to handle whatever keywords come up afterwards. The suggestion of the article is that rather than iterating on a lookup table, you can use a hashing function to more quickly determine which keyword you are looking at.

The extra complexity of this method however (having to use extra tools) makes me lean towards simple iteration - easier to code, and when you add a new token, it's a minimal change.

-- Joe

Re:And the standard says... (1)

Frankie70 (803801) | more than 7 years ago | (#20032821)


Anyone writing or maintaining command line programs knows that they
should be using the API getopt() or getopt_long().


There is no getopt or getopt_long in the C or C++ standard.

Functional Programming rules the world (0)

Anonymous Coward | more than 7 years ago | (#20032407)

OCaml for the win

Equivalent Python (0, Informative)

Anonymous Coward | more than 7 years ago | (#20032469)

import sys

def function_1 (...):
    ...

functions = {'a': function_1,
             'b': function_2,
             'c': self.method_1, ...}
func = functions[value]

if __name__ == '__main__':
    args = sys.argv[1:]
    func(args)

# The variable "functions" is set to a Python dictionary.
# Built-in dictionaries already use fast hash-table lookups.

Re:Equivalent Python (0)

Anonymous Coward | more than 7 years ago | (#20032513)

(1) Python dict() does not use a perfect hash function.

(2) In your example, the dictionary is built online rather than being compiled into the program.

(3) Your chosen language has no support for buffer overflows and is far too easy to understand and maintain.

Re:Equivalent Python (0)

Anonymous Coward | more than 7 years ago | (#20032807)

Pretty. However, if your purpose was to somehow show how Python is "superior" when it comes
to parsing command-line options, uhhhh... get real. Who cares? After all, my special language
simply handles command-line options with no code at all. It just figures it out from the options
you ask for in the program. MUCH smaller than your Python code. And this means..?

If, on the other hand, you were more interested in demonstrating how a Python program with
nice command line handling might talk to C or C++ for some function, I applaud you and
also recommend that you also explore:

* Boost::program_options
* Boost::Python
* SWIG
* Shed Skin (Python -> C++ compiler) ...have fun!

Re:Equivalent Python (0)

Anonymous Coward | more than 7 years ago | (#20033021)

But I like pretty. After using C++, Java, PHP, and ugghhh...VB for a while, I finally got around to learning Python...and I'm hooked. I haven't gotten to the point of handling command line options in the 2 wxPython apps I'm working on. When I saw the C++ code, I looked at 3 Python examples and synthesized the Python version. And posted it, cause it was so pretty.

If, on the other hand, you were more interested in demonstrating how a Python program with
nice command line handling might talk to C or C++ for some function, I applaud you and
also recommend that you also explore:

* Boost::program_options
* Boost::Python
* SWIG
* Shed Skin (Python -> C++ compiler) ...have fun!
Nah to do that I'd have to import the ctypes [python.org] library. It would have added a few LOC. I'm interested in integrating C/C++ code if necessary for performance, but I haven't focused on that yet, preferring pure Python for its simplicity.

But, similar to Shed Skin, PyPy [codespeak.net] is pretty nifty. It's currently Python written on top of Python, but you can "translate" the high-level code to C, .Net, JVM, even Javascript (!). It's very similar to the pseudo-code and code generators in books like the Pragmatic Programmer. Crazy.

Joke? (0)

Anonymous Coward | more than 7 years ago | (#20032509)

What kind of joke is this? The example in listing 1 is using strtok() to do something it can't do, and even if it did what the authors intended, they wrote comments documenting something else.

Re:Joke? (0)

Anonymous Coward | more than 7 years ago | (#20032711)

probably H1B PhD.

Re:Joke? (0)

Anonymous Coward | more than 7 years ago | (#20032811)

> probably H1B PhD.

Yeah, Bill Gates just can't get enough of these guys. Put 5000 in a room for a year and they'll bang out total crap that'll you'll be forced to sell to reclaim your 'investment'. If you're really unlucky, they may even leave you with a real stinker like Windows Vista.

Correction... (1, Insightful)

Pedrito (94783) | more than 7 years ago | (#20032529)

Just about any relatively complicated software has dozens of available command-line options.

That should probably be rephrased to "Just about any relatively complicted software that inflicts command-lines on its users..."

This is clearly a very unix oriented post as there are relatively few command-line windows apps and few window GUI apps that accept command-lines. But this is also a topic that's about as old as programming itself and clearly something that takes the "new" out of "news".

Re:Correction... (1)

AuMatar (183847) | more than 7 years ago | (#20032647)

Umm, most Windows apps accept command line inputs- its just not the default way of using it. But type it in at the command line and you'd be surprised. A few that come to mind- VC++'s compiler and internet explorer.

Re:Correction... (1)

Ambiguous Puzuma (1134017) | more than 7 years ago | (#20032677)

You might be surprised. Command line options may not be featured prominently in Windows applications, but that doesn't mean they're not there. If you have Microsoft Visual Studio, for example, try "devenv /?" sometime. (For non-Windows users: Devenv.exe is the executable to start Visual Studio's IDE.)

Re:Correction... (1, Interesting)

Anonymous Coward | more than 7 years ago | (#20032679)

Hardly the case. Most of the win32 shit I've used accepts command lines. It's much simpler and a more powerful debugging tool then to force a config file change for every attempt.

Re:Correction... (1)

Maniac-X (825402) | more than 7 years ago | (#20033639)

That's not true. Most Windows programs accept command-line arguments (just take a look at ANY game, as a matter of fact); they're simply not used often because most Windows users a) don't know they exist, b) wouldn't know how to do it without some detailed instruction, and c) would probably not see the point in trying it anyway.

Not for command lines ... (1)

FrnkMit (302934) | more than 7 years ago | (#20032731)

I haven't even read TFA, but I know that gperf isn't for command lines; getopt() (in its various forms) more than adequately does its job.

One real use of gperf and perfect hashes that I know of is in TAO (The ACE ORB), an implementation of CORBA. Since CORBA includes the class and method names as strings, a perfect hash speeds up each lookup of the actual routine to call.

In modern times, I can imagine gperf (or a Java/C#/Ruby/whatever port) speeding up SOAP or other XML-based protocols.

I like gperf, but... (0)

Anonymous Coward | more than 7 years ago | (#20032735)

...it's more than a little pointless to use it for command-line options, especially in C++. For
one thing, as others have pointed out, I have a hard time imagining a case in which command-line
parsing is a real bottleneck for any application. And, given that that's the case, having to write
lots of special functions and use extra tools for something that is a problem solved well through
freely-available libraries seems like something of a waste of time. I assume that the true purpose of
the article was to remind people of gperf.

Respectfully to the IBM authors, you might as well just use lex and perhaps yacc if you're
dealing with C and need to write a parser, or a library that does a much better job of handling
command-line options (such as GNU getopt) and their problems which range far beyond merely parsing
things.

With C++, you have available those libraries as well, but if you want to try other approache, Boost
("http://www.boost.org") has a very nice command-line option library that also sports an expressive
notation for describing the options in code.

In any case, it's nice to see an article on gperf, but here it felt somewhat rediculously applied.

Wrong in so many ways (4, Insightful)

geophile (16995) | more than 7 years ago | (#20032819)

Perfect hash functions are curiosities. If you have a static set of keys, then with enough work you can generate a perfect (i.e. collision-free) hash function. This has been known for many years. The applicability is highly limited, because you don't usually have a static set of keys, and because the cost of generating the perfect hash is usually not worth it.

Gperf might be reasonable as a perfect hash generator for those incredibly rare situations when the extra work due to a hash collision is really the one thing standing between you and acceptable performance of your application.

I thought maybe we were seeing a bad writeup, but no, it's the authors' themselves who talk about the need for high-performance command-line processing, and give the performance of processing N arguments as O(N)*[N*O(1)]. I cannot conceive of a situation in which command-line processing is a bottleneck. And their use of O() notation is wrong (they are claiming O(N**2) -- which they really don't want to do, not least because it's wrong). O() notation shows how performance grows with input size. Unless they are worrying about thousands or millions of command-line arguments, O() notation in this context is just ludicrous.

I don't know why I'm going on at such length -- the extreme dumbness of this article just set me off.

Re:Wrong in so many ways (1)

pclminion (145572) | more than 7 years ago | (#20032937)

Gperf might be reasonable as a perfect hash generator for those incredibly rare situations when the extra work due to a hash collision is really the one thing standing between you and acceptable performance of your application.

The primary REAL use of gperf is generating keyword recognizers for language parsers. It's another tool in the same vein as lex and yacc.

Re:Wrong in so many ways (0)

Anonymous Coward | more than 7 years ago | (#20033319)

That is because you are too dumb to understand the article..the code becomes difficult to maintain with time as the number of if else comparision increases. The authors are specifically pointing to such scenarios where the numbers of options and their parsing increases. gperf is simply a way to generate the code dynamically. It is not the only way and it wont be too hard to write a similar function any way.

Historically? (3, Insightful)

ClosedSource (238333) | more than 7 years ago | (#20032839)

"Command-line processing is historically one of the most ignored areas in software development."

This is like saying that walking is historically one of the most ignored areas in human transportation.

is this a joke? (2, Insightful)

oohshiny (998054) | more than 7 years ago | (#20032879)

If it's not, the author of that article should be kept as far away from writing software as possible; he epitomizes the attitude that so frequently gets C++ programmers into trouble.

gcc: the prefect candidate? (1)

e9th (652576) | more than 7 years ago | (#20032905)

Tons of options, but what do we see? Only stuff like

else if (! strncmp (argv[i], "-print-file-name=", 17))

Maybe they're just too scared of its present options processing to change it.

Is this a fucking joke? (2, Funny)

pclminion (145572) | more than 7 years ago | (#20032927)

Where's the Foot icon? Optimizing command line parsing? Oh God, my sides are splitting.

This is ridiculous (1)

Bluesman (104513) | more than 7 years ago | (#20032979)

First of all, how many programs have command line parsing as a bottleneck?

Secondly, they should put this functionality into GCC instead, so that it creates a perfect hash for any large switch statement.

Re:This is ridiculous (0)

Anonymous Coward | more than 7 years ago | (#20033201)

Switch statements use integer keys, you don't need a hash table. You can directly index into a jump table (assuming the indexes are reasonably compact; if your only two cases are 0, a million, and a billion, obviously a compiler would rather use an if-else statements). It's very low overhead, which was the whole point of including a switch statement to begin with. :)

Re:This is ridiculous (1)

larry bagina (561269) | more than 7 years ago | (#20033207)

gperf is concerned with string hashing. c switch statements use integers. All modern c compilers (even gcc) look at the case density and build an indirection table or set of if/else/else branches. (Or sometimes both).

wow, how pointless is that (1, Insightful)

Anonymous Coward | more than 7 years ago | (#20033283)

I've probably used more time typing this message than every program I've ever run has used parsing command line arguments.

gperf for options?!?!?! (0)

Anonymous Coward | more than 7 years ago | (#20033421)

?!?!?\N{INTERROBANG}

Talk about overkill. gperf is for parsers. Yeah, I know getopt is itself a parser, but I think anyone who's done real programming knows what I mean.

Besides, perfect hashes are old and busted. Cuckoo hashes give you almost identical performance and are far more flexible.

Another approach - parseargs (2, Interesting)

argent (18001) | more than 7 years ago | (#20033443)

Something Eric Allman wrote many moons ago. I found it and modified it to support "native" command line syntax on MS-DOS, VMS, and AmigaDOS, and added some support for improved self-documentation... and then Brad Appleton saw it and rapidly enhanced it to support a plethora of shells and interfaces until it took up 10 posts in comp.sources.misc.

The following two directories should bring it up to the latest version I know of.

This is not efficient, mind you. Command line parsing doesn't generally need to be efficient, even by my miserly standards, honed when a PDP-11 was something you hoped to upgrade to... some day...

ftp://ftp.uu.net/usenet/comp.sources.misc/volume29 /parseargs/ [uu.net]
ftp://ftp.uu.net/usenet/comp.sources.misc/volume30 /parseargs/ [uu.net]

PARSEARGS
 
                        extracted from Eric Allman's
 
                            NIFTY UTILITY LIBRARY
 
                          Created by Eric P. Allman
                            <eric@Berkeley.EDU>
 
                        Modified by Peter da Silva
                            <peter@Ferranti.COM>
 
                  Modified and Rewritten by Brad Appleton
                          <brad@SSD.CSD.Harris.COM>
Brad's latest work in this area seems to be here:

http://www.cmcrossroads.com/bradapp/ftp/src/libs/C ++/CmdLine.html [cmcrossroads.com]

http://www.cmcrossroads.com/bradapp/ftp/src/libs/C ++/Options.html [cmcrossroads.com]

Silly (1)

m.dillon (147925) | more than 7 years ago | (#20033559)

This is kinda silly. If you only have a few keywords you don't need anything sophisticated. If you have more then a few but not more then a few dozen its usually easiest just to arrange them in a linear array and do an index lookup based on the first character to find the starting point for your scan. More then that and you will want to hash them or arrange them in some sort of topology such as a red-black tree.

Generally speaking hashes are very cpu and cache-inefficient beasts, especially if one can reap the benefit of the locality of reference you get with other schemes. Hashes are easy to implement, though, so if you have a lot of keywords and there is either no locality of reference anyway or you don't care about the performance, a hash works just fine.

Insofar as strings go, once you get beyond a certain point its easiest to just hash the string on the front-end, deal with any collisions on the front-end as well (aka implement a string table and modify the hash value for one of the strings if a collision occurs), and then simply reference the string via its hash value in the remainder of the program instead of actually doing any further string comparisons. As an extention of this one can use a larger 64-bit hash and consider any collisions to be fatal. This is extremely viable for a language parser given that the chances of a collision actually occuring are so low you might only get one, or even zero, across the entire domain of source code in existence today.

If you have a fixed set of keywords, then a 16 or 32 bit hash is usually sufficient to avoid collisions. At this point you just generate a header file with the values and switch on them. e.g. hv = hash(str); switch(hv) { case KEYWORD_FOR: ... case ... }. This is equivalent to the use of some sort of data structure but it winds up being coded and optimized directly by the compiler, and it's very easy to understand the resulting source code.

-Matt

And it's a gpl tool (1)

Suicyco (88284) | more than 7 years ago | (#20033713)

Which means that using at the command line is "linking" it. Doing so, of course, means your upstream code must be GPL as well. Ad Infinitum. Sorry, but the bulk of c/c++ code out there is non-gpl licensed and therefor can take no advantage of tools such as this.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>