Don't Overlook Efficient C/C++ Cmd Line Processing

Slashdot is powered by your submissions, so send in your scoop

Don't Overlook Efficient C/C++ Cmd Line Processing 219

Posted by CmdrTaco on Sunday July 29, 2007 @11:33AM from the also-don't-eat-yellow-snow dept.

An anonymous reader writes "Command-line processing is historically one of the most ignored areas in software development. Just about any relatively complicated software has dozens of available command-line options. The GNU tool gperf is a "perfect" hash function that, for a given set of user-provided strings, generates C/C++ code for a hash table, a hash function, and a lookup function. This article provides a reference for a good discussion on how to use gperf for effective command-line processing in your C/C++ code."

This discussion has been archived. No new comments can be posted.

Don't Overlook Efficient C/C++ Cmd Line Processing

Load All Comments

Search 219 Comments Log In/Create an Account

Comments Filter:

Speed in options parsing? (Score:5, Insightful)

by tot ( 30740 ) writes: on Sunday July 29, 2007 @11:40AM (#20032263)

I would not consider speed of command line option processing to be bottleneck in any application, the overhead of starting of the program is far greater.

Share
twitter facebook
- Re:Speed in options parsing? (Score:4, Insightful)
  
  by ScrewMaster ( 602015 ) writes: on Sunday July 29, 2007 @11:42AM (#20032279)
  
  I'd say the speed of human motor activity is an even greater limiting factor.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Informative)
    
    by pete-classic ( 75983 ) writes:
    
    What a limited point of view. See "man system", for example.
    
    -Peter
  - Re: (Score:3, Insightful)
    
    by timeOday ( 582209 ) writes:
    
    I'd say the speed of human motor activity is an even greater limiting factor.
    
    I wouldn't bet on that. The command line is not just a human/computer interface, but also a computer/computer interface. It's very common for one script to fire off many others.
    
    That said, I agree with the grandparent that it's hard to imagine a program where command line processing is a significant runtime expense.
- Re: (Score:2, Insightful)
  
  by Anonymous Coward writes:
  
  It's still handy to have a fairly comfortable way of generating code that does things needed every time (or at least very, very often) in an easily applicable and very optimized way. I like it.
  - Re: (Score:3, Funny)
    
    by smittyoneeach ( 243267 ) * writes:
    
    Writing code--drudge work.
    Writing code that writes code--now we're thinking!
    - - Re:Speed in options parsing? (Score:4, Informative)
        
        by JesseMcDonald ( 536341 ) writes: on Monday July 30, 2007 @01:04PM (#20044629) Homepage
        
        Writing code that writes code--now we're thinking!
        
        But what could we call this code, a compiler? Nah, I think we need to think of another word for it.
        
        How about "macro"? [jhu.edu]
        
        Parent Share
        twitter facebook
- Re:Speed in options parsing? (Score:5, Informative)
  
  by ChronosWS ( 706209 ) writes: on Sunday July 29, 2007 @11:46AM (#20032301)
  
  Indeed, what the hell? Now you have to have another tool and another source file for what is essentially declaring a dictionary in C++, which should be in any good developer's library? Yeesh.
  
  If you don't like the nasty nested ifs, make the keys in your dictionary the command line options and the values delegates, then just loop through your list of options passed on the command-line, invoking the delegate as appropriate. Eliminates the if, there are no switch statements either, and each of your command line arguments is now handled by a function dedicated to it, bringing all of the benefits of compartmentalizing your code rather than stringing it out in a huge processing function.
  
  Parent Share
  twitter facebook
  - Broken handling of vtables in linkers (Score:5, Informative)
    
    by tepples ( 727027 ) writes: <tepples.gmail@com> on Sunday July 29, 2007 @12:06PM (#20032439) Homepage Journal
    
    Now you have to have another tool and another source file for what is essentially declaring a dictionary in C++, which should be in any good developer's library?
    Due to the brokenness of how some linkers handle virtual method lookup tables, using anything from the C++ standard library tends to bring in a large chunk of dead code [wikipedia.org] from the standard library. I compiled hello-iostream.cpp using MinGW and the executable was over 200 KiB after running strip, compared to the 6 KiB executable produced from hello-cstdio.cpp. Sometimes NIH syndrome produces runtime efficiency, and on a handheld system, efficiency can mean the difference between fitting your app into widely deployed hardware and having to build custom, much more expensive hardware.
    
    Parent Share
    twitter facebook
    - only relevent to static linking (Score:5, Informative)
      
      by sentientbrendan ( 316150 ) writes: on Sunday July 29, 2007 @02:36PM (#20033479)
      
      It sounds like the author is statically linking his library and running on embedded an embedded system. It is not surprising in that case that the c++ standard library brings in much more code than the c standard library, but it should be made clear that it is not relevant to desktop developers, pretty much all of which dynamically link with glibc.
      
      Again, to be clear, dynamically linking with the c++ standard library is not going to increase your executable size. Please don't try to roll your own code that exists in the standard library. It is a real nuisance when people do that.
      
      I should qualify that by saying that template instantiations do (of course) increase executable size, but that they do so no more than if you had rolled your own.
      
      Parent Share
      twitter facebook
      - Which platform uses dynamic libstdc++? (Score:3, Insightful)
        
        by tepples ( 727027 ) writes:
        
        It is not surprising in that case that the c++ standard library brings in much more code than the c standard library, but it should be made clear that it is not relevant to desktop developers, pretty much all of which dynamically link with glibc.
        On MinGW, the port of GCC to Windows OS, my programs dynamically link with msvcrt, not glibc. Also on MinGW, libstdc++ is static, just like in the embedded toolchain. Are you implying that one of the C++ toolchains for Windows uses a dynamic libstdc++? Which toolchain for which operating system that is widely deployed on home desktop computers are you talking about?
    - - All the world is not a PC (Score:5, Insightful)
        
        by tepples ( 727027 ) writes: <tepples.gmail@com> on Sunday July 29, 2007 @12:34PM (#20032623) Homepage Journal
        
        HOLY SHIT! 194KB BIGGER?! HOW WILL YOU EVER FIND THE SPACE FOR SUCH A HUGE EXECUTABLE?!?!
        I develop for a battery-powered computer with 384 KiB of RAM. In such an environment, what you appear to sarcastically call a "mere couple hundred kilobytes" is a bigger deal than it is on a personal computer manufactured in 2007.
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by sholden ( 12227 ) writes:
        
        And you do so using MinGW and c++?
        
        devkitARM (Score:3, Informative)
        
        by tepples ( 727027 ) writes:
        
        And you do so using MinGW and c++?
        Yes, I do so with devkitARM [devkitpro.org] (a cross-compiling GCC toolchain that is itself compiled with MinGW) and C++.
        
        Re: (Score:2)
        
        by sholden ( 12227 ) writes:
        
        What the toolkit is compiled with is irrelevant. You're not using it unless you are compiling code targeted to MS Windows, which I don't think you are. Doing the iostream versus stdio hello world on local gcc gives a difference of 496 bytes hence my guess that the way MinGW links libraries might be the reason for the bloat. And since MinGW targets win32, bloat is simply not an issue.
        
        Byte counts when compiled with devkitARM (Score:2)
        
        by tepples ( 727027 ) writes:
        
        What the toolkit is compiled with is irrelevant. You're not using it unless you are compiling code targeted to MS Windows, which I don't think you are.
        I knew that. But I have generally seen overheads of the same magnitude when using standard C++ libraries on devkitARM as on MinGW. I just tried it on the GBA: 5,156 bytes for hello-world.mb, which just pushes a C string straight into agbtty_puts(), and 253,652 bytes for hello++.mb, which pushes output through a std::ostringstream and then into agbtty_puts(). (The limit for a .mb executable is 262,144 bytes, as the other 128 KiB of RAM in the system is specialized.)
        Doing the iostream versus stdio hello world on local gcc gives a difference of 496 bytes
        What "local" platform are you talking abo
        
        Re: (Score:2)
        
        by sholden ( 12227 ) writes:
        
        My local ARM NAS box running linux, of course it uses dynamic linking, I'm not a sadist. If I statically link the executable size for the iostreams versions is double the size of the stdio version.
        
        C++ libraries are big I'd assume if you wanted to use them in a low-RAM environment you'd write/buy/steal/download space efficient implementations (if such a thing exists, templates are embedded pretty deep and they bloat the binary).
        
        Re: (Score:2)
        
        by pyrrhonist ( 701154 ) writes:
        
        My local ARM NAS box running linux
        
        Are you running a, "slug", or some other box?
        
        Re: (Score:2)
        
        by andreyw ( 798182 ) writes:
        
        Calling your Linksys slug a "NAS box" is probably pushing it juuuust a bit.
        
        Plus it runs on MIPS.
        
        Re:Byte counts when compiled with devkitARM (Score:4, Funny)
        
        by sholden ( 12227 ) writes: on Monday July 30, 2007 @12:04AM (#20038207) Homepage
        
        It's not pushing it all. It's storage, it's network attached, it's in a box... What I am pushing is the poor little linksys device. It's plugged into 4 USB hard drives (plus a thumb drive, but that's just for booting) which it's running software RAID5 on. Poor little thing, if it could scream I'm sure it would be. Sadly it's the only machine with a C++ compiler on it at home these days...
        
        Please don't tell the poor thing it's running on MIPS, the ARMv5TE kernel might just freak out and collapse the universe.
        
        Parent Share
        twitter facebook
        
        Character encoding conversion (Score:3, Informative)
        
        by tepples ( 727027 ) writes:
        
        How many of these embedded tools you write actually _do_ command line processing?
        None yet, but they do handle other things that involve dictionaries, such as character encoding conversion. A program designed to move items back and forth between a town in Animal Crossing (for Nintendo GameCube) and a town in Animal Crossing: Wild World (for Nintendo DS) needs to be able to understand the encodings of character names and town names that these games use, possibly by converting between their proprietary 8-bit codecs and UTF-8.
        why don't you invest in more (both memory- and time-) efficient ways to do IPC than the command line?
        Because the command line, pipes, and sockets are the most obvio
        
        Re: (Score:2, Funny)
        
        by mikael ( 484 ) writes:
        
        I thought that read kibobytes, or maybe even kibblebytes
        
        Re: (Score:2, Informative)
        
        by siride ( 974284 ) writes:
        
        A kilobyte means 1024 bytes among programmers. Any programmer that doesn't know that would likely not know what a kibibyte is either.
- Re: (Score:2)
  
  by hxnwix ( 652290 ) writes:
  
  Except on Windows XP, where pipe performance degraded an order of magnitude as compared to Windows 2000.
- Re:Speed in options parsing? (Score:5, Funny)
  
  by Anonymous Coward writes: on Sunday July 29, 2007 @12:11PM (#20032473)
  
  You're not a real programmer if you won't over optimize unrelevant parts of your code.
  
  Parent Share
  twitter facebook
  - Re:Speed in options parsing? (Score:4, Funny)
    
    by Maniac-X ( 825402 ) writes: on Sunday July 29, 2007 @02:43PM (#20033515) Homepage
    
    Klingon function calls do not have 'parameters' - they have 'arguments.' AND THEY ALWAYS WIN THEM!
    
    Parent Share
    twitter facebook
  - Re: (Score:2)
    
    by fractoid ( 1076465 ) writes:
    
    Over-optimise unrelevant parts of code? Unpossible!
- Re:Speed in options parsing? (Score:4, Insightful)
  
  by canuck57 ( 662392 ) writes: on Sunday July 29, 2007 @12:11PM (#20032479)
  
  I would not consider speed of command line option processing to be bottleneck in any application, the overhead of starting of the program is far greater.
  Your just experiencing this with Java, Perl or some other high overhead bloated program. People often pull out a heavy weight needing a 90MB VM or a 5-10MB basis library calling the cats breakfast of shared libraries I would agree, but lets take a look at C based awk for example, it is only a 80kb draw. Runs fast, nice and general purpose and does a good job of what it was designed to do. It can be pipelined in, out and used directly on the command line as it has proper support for stdin, sndout and stderr. On my system, only 10 disk blocks to load.
  While fewer people are proficient at it, C/C++ will outlast us all for a language. Virtually every commodity computer today uses it in it's core. Many others have come and gone yet all our OSes and scripting tools rely on it. So any dooms day predictions would be premature, and if your want fast, efficient and lean code you do C/C++....
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Insightful)
    
    by ultranova ( 717540 ) writes:
    
    While fewer people are proficient at it, C/C++ will outlast us all for a language. Virtually every commodity computer today uses it in it's core.
    Which is why they are so crash-prone. With C/C++, any mistake whatsoever will likely crash the program/machine, and possibly also allow crackers to make the program execute arbitrary code.
    Many others have come and gone yet all our OSes and scripting tools rely on it. So any dooms day predictions would be premature, and if your want fast, efficient and lean co
  - Re: (Score:2)
    
    by asuffield ( 111848 ) writes:
    
    I would not consider speed of command line option processing to be bottleneck in any application, the overhead of starting of the program is far greater.
    Your just experiencing this with Java, Perl or some other high overhead bloated program.
    No, even with the most naive of command-line argument parsing code, it is highly unlikely that it will take a significant amount of time compared to the effort required for the kernel to fork off a new process, exec the binary, and for the dynamic loader to set it up fo
- Re: (Score:2, Insightful)
  
  by Anonymous Coward writes:
  
  Indeed. The applications of perfect hashing (and minimal perfect hashing) are quite limited. Basically it only makes sense if you need to quickly identify strings from a fixed, finite set of strings known at compile time. And as with all optimizations, only if that part of your program is a bottle neck or you are prepared to optimize all other aspects of your program as well.
  
  The traditional example application for perfect hashing was identifying keyword tokens when building a compiler, but for complex moder
- Re:Speed in options parsing? (Score:4, Funny)
  
  by ai3 ( 916858 ) writes: on Sunday July 29, 2007 @01:11PM (#20032857)
  
  You must not have seen the recent proposal for GNU tools options, which will require four dashes instead of two and a minimum of four words per option. Under a UN/EU funded program to ease the transition to intelligent machines, developers are rewarded for implementing full-sentence options and/or prose. But initial experiments showed that many users where unwilling to wait for the parsing of the command "remove-files --recursively-from-root-directory --do-not-ask-for-confirmation-just-delete --i-really-want-this!" just to be 1337, which led to whatever development efforts are mentioned in the article, which I didn't read.
  
  Parent Share
  twitter facebook
  - - Re: (Score:3)
      
      by Millenniumman ( 924859 ) writes:
      
      What is the problem with tabs? Are there any text editors/compilers that anyone uses that don't support tabs? I find them to be better than multiple spaces, even if the text editor has tab mapped to multiple spaces.
      - Re: (Score:3, Interesting)
        
        by sbryant ( 93075 ) writes:
        
        What is the problem with tabs?
        The problem is that people set their tab breaks at all sorts of places (eg: every 4 characters), and then use tabs to space things in the middle of lines, or they'll mix tabs and spaces at the beginnings of lines. When somebody with different settings opens the same file, the indentation looks really screwed. That happens even after you've gotten everybody to agree on a common number of columns for indentation.
        I only know of two solutions:
        
        Make all software, everywhere,
        
        Re: (Score:2)
        
        by Millenniumman ( 924859 ) writes:
        
        Huh, that's one of the reasons I like tabs. People can choose different tab stop sizes to fit their preference, without changing source files. I guess I can see that not working well if tabbing is done strangely, or mixed with spaces. I mostly leave that to my editor so it's done fairly well.
      - Re: (Score:3, Insightful)
        
        by HeroreV ( 869368 ) writes:
        
        "Mixing tabs and spaces for indenting is bad. It causes many problems that you don't encounter when using only spaces. Therefore, tabs are bad, so only use spaces."
        
        That is the only significant argument against tabs I've ever read, and I've probably read it a hundred times. Only a moron wouldn't realize that it's the mixing that is bad, not the tabs or the spaces, but apparently there are a lot of morons out there.
        
        tabs: good
        spaces: good
        mixing tabs and spaces: bad
        
        I personally prefer tabs. Why?
        
        Different code
      - Re:Speed in options parsing? (Score:4, Insightful)
        
        by VGPowerlord ( 621254 ) writes: on Sunday July 29, 2007 @10:17PM (#20037453)
        
        Not everyone uses the same tab stops.
        
        I see that as a good reason to use tabs. Don't like how far it's indented? Change how wide your editor displays tabs.
        
        Parent Share
        twitter facebook
- Re: (Score:2)
  
  by amightywind ( 691887 ) writes:
  
  Use gcc much?
- Re: (Score:2)
  
  by VGPowerlord ( 621254 ) writes:
  
  I would not consider speed of command line option processing to be bottleneck in any application, the overhead of starting of the program is far greater.
  
  Have you tested this using getopt() and getopt_long() , or did you mean by parsing them manually?
Too much (Score:4, Insightful)

by bytesex ( 112972 ) writes: on Sunday July 29, 2007 @11:41AM (#20032269) Homepage

I'm not sure that for the usually simple task of command line processing, I'd like to learn a whole new lex/yacc syntax thingy.

Share
twitter facebook
- Re:Too much (Score:5, Insightful)
  
  by hackstraw ( 262471 ) writes: on Sunday July 29, 2007 @07:26PM (#20036061)
  
  I'm not sure that for the usually simple task of command line processing, I'd like to learn a whole new lex/yacc syntax thingy.
  
  The syntax for gperf is not that bad, but its simply the wrong tool for the job as far as commandline processing goes.
  
  gperf simply makes a "perfect" has function for searching a predetermined static lookup. It provides no mechanism for arbitrary arguments like input filenames or modifiers (like a filter for including/excluding things, or increasing/decreasing something) nor does it check for conflicting options or missing options.
  
  gperf would give you nothing besides a match of input to a state. gperf would provide nothing for a common commandline like: --include="*.txt" --exclude="*.backup" --with-match="some text|or this text" --limit-input=5megabytes
  
  getopt or just rolling your own if/else if ladder or switch statement would provide much more flexibility over gperf.
  
  Now, with parsing a configuration file, gperf might help, but for processing commandline arguments, gperf is simply the wrong tool for the job.
  
  This is like the second or third slashdot posting from IBM's developer works that is simply a well formated nonsense. Past examples are http://developers.slashdot.org/article.pl?sid=07/0 4/09/1539255 [slashdot.org] and http://developers.slashdot.org/article.pl?sid=07/0 4/09/1539255 [slashdot.org]
  
  This is silly on both slashdot and IBMs part.
  
  Parent Share
  twitter facebook
Yeah, because getopt(3) is a real bottleneck (Score:5, Insightful)

by V. Mole ( 9567 ) writes: on Sunday July 29, 2007 @11:50AM (#20032331) Homepage

Does the phrase "reinvent the wheel" strike a chord with anyone?

Share
twitter facebook
- It is if the linker complains about not finding it (Score:5, Informative)
  
  by tepples ( 727027 ) writes: <tepples.gmail@com> on Sunday July 29, 2007 @12:29PM (#20032591) Homepage Journal
  
  Yeah, because getopt(3) is a real bottleneck
  getopt() is in the header <unistd.h>, which is in POSIX, not ANSI. POSIX facilities are not guaranteed to be present on W*nd?ws systems. It also handles only short options, not long options. For those, you have to use getopt_long() of <getopt.h>, which isn't even in POSIX.
  Does the phrase "reinvent the wheel" strike a chord with anyone?
  If the wheel isn't licensed appropriately, copyright law requires you to reinvent it. Specifically, using software under the GNU Lesser General Public License [gnu.org] in a proprietary program intended to run on a platform whose executables are ordinarily statically linked, such as a handheld or otherwise embedded system, is cumbersome.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Interesting)
    
    by tqbf ( 59350 ) writes:
    
    Are you seriously trying to argue that gperf is more portable than getopt?
    - Re: (Score:2)
      
      by tepples ( 727027 ) writes:
      
      Are you seriously trying to argue that gperf is more portable than getopt?
      I'm not arguing specifically in favor of gperf, but arguing generally that reinventing the standard library has its justifications at times.
    - - Re: (Score:3, Informative)
        
        by tqbf ( 59350 ) writes:
        
        Absolutely. There is no platform for which gperf is a better, more portable option for command line processing than getopt. I'm not sure what you think getopt does that is "tricky" under Win32. Its a string processor.
        
        Re: (Score:2)
        
        by VGPowerlord ( 621254 ) writes:
        
        Right, but the great-great-grandparent never said gperf is more portable than getopt. What he did say was that getopt() doesn't work on Windows and that it only handles short arguments. From that, you drew a conclusion not supported by the facts in evidence (to put it in lawyer speak).
        
        Re: (Score:3, Informative)
        
        by tqbf ( 59350 ) writes:
        
        Again, on the off chance that this helps anyone reading this pitifully long and silly thread: it is trivial to make getopt work on Win32, just like it was trivial to make strsep work on Linux when it only had strtok. I object to the argument that "portability" has anything whatsoever to do with whether you'd use getopt to parse arguments.
        
        Like most of the other comments on this post, I find the idea of using gperf for "high performance argument parsing" superfluous and convoluted. In fact, I find the idea o
  - Re: (Score:2)
    
    by Waffle Iron ( 339739 ) writes:
    
    Well, if you're going to reinvent the wheel, you might a well do it compatibly. You can get a BSD-style licensed implementation of getopt and getopt_long [geocities.com] that is portable to Windows. From the README:
    WHY RE-INVENT THE WHEEL?
    I re-implemented getopt, getopt_long, and getopt_long_only because there were noticable bugs in several versions of the GNU implementations, and because the GNU versions aren't always available on some systems (*BSD, for example.) Other systems don't include any sort of standard argum
  - Re:It is if the linker complains about not finding (Score:3, Insightful)
    
    by ucblockhead ( 63650 ) writes:
    
    When faced with this issue, I simply wrote a Windows version of getopt. Took about a day.
    
    Even when reinventing the wheel, it is important to reinvent as little as possible. If you need functionality that isn't there, at least keep the same interface.
    - Re: (Score:2, Informative)
      
      by __aawavt7683 ( 72055 ) writes:
      
      When faced with the issue of implementing getopt on Windows, I merely took the code from FreeBSD: src/lib/libc/stdlib/getopt.c
      
      I love FreeBSD. (I once changed the motherboard, rebooted, went, "Oh.. shit," and proceeded to login. All drivers are compiled as modules, in less time than my lean linux kernel. :-/)
      
      I sidestepped the license issue, stripped out extraneous header files, changed a couple referenced to _getprogname() (either to static string "" or to a global var, as it is in libc), read the man page t
And the standard says... (Score:5, Insightful)

by Anonymous Coward writes: on Sunday July 29, 2007 @12:02PM (#20032399)

Good grief. What a strawman of an example.
Anyone writing or maintaining command line programs knows that they
should be using the API getopt() or getopt_long().
There are standards on how command line options and arguments are to be
processed. They should be followed for portability and code maintenance.

Share
twitter facebook
- I agree... (Score:3, Insightful)
  
  by SuperKendall ( 25149 ) writes:
  
  There's a time and place for gperf - command line argumnet processing is not it!
  
  Actually, I've never really come across a case where I knew ahead of time the whole universe of strings I would be accepting, and so never ended up using it - gperf is a great idea, but this seems to be a case of someone really looking hard to figure out where they could shoehorn gperf into just for the sake of using it.
- Re: (Score:2)
  
  by The Vulture ( 248871 ) writes:
  
  From what I can see in the article, it's not meant to replace getopt/getopt_long.
  
  I am currently writing an application (for my employer) where this may be useful. Although it also uses command line parameters (via getopt_long), it also receives commands in ASCII over a network connection - that is what I believe this article targets.
  
  Because the commands I receive can have almost any series of parameters in any sequence however, I prefer to do what another poster here already stated - you look for keywords
- Re: (Score:2)
  
  by Frankie70 ( 803801 ) writes:
  
  Anyone writing or maintaining command line programs knows that they
  should be using the API getopt() or getopt_long().
  
  There is no getopt or getopt_long in the C or C++ standard.
  - Re: (Score:2)
    
    by jlarocco ( 851450 ) writes:
    
    There is no getopt or getopt_long in the C or C++ standard.
    
    getopt is in Posix.
    
    getopt_long is a GNU extension, though
- Re: (Score:2, Informative)
  
  by JNighthawk ( 769575 ) writes:
  
  Yes, because we should be using functions that are NOT IN THE STANDARD to maintain portability.
  
  Oh, and as far as I know, those functions aren't in VC++, which is what a hefty chunk of C/C++ development is done on.
Correction... (Score:2, Insightful)

by Pedrito ( 94783 ) writes:

Just about any relatively complicated software has dozens of available command-line options.

That should probably be rephrased to "Just about any relatively complicted software that inflicts command-lines on its users..."

This is clearly a very unix oriented post as there are relatively few command-line windows apps and few window GUI apps that accept command-lines. But this is also a topic that's about as old as programming itself and clearly something that takes the "new" out of "news".
- Re: (Score:2)
  
  by AuMatar ( 183847 ) writes:
  
  Umm, most Windows apps accept command line inputs- its just not the default way of using it. But type it in at the command line and you'd be surprised. A few that come to mind- VC++'s compiler and internet explorer.
Wrong in so many ways (Score:5, Insightful)

by geophile ( 16995 ) writes: <jao@NOspAM.geophile.com> on Sunday July 29, 2007 @01:05PM (#20032819) Homepage

Perfect hash functions are curiosities. If you have a static set of keys, then with enough work you can generate a perfect (i.e. collision-free) hash function. This has been known for many years. The applicability is highly limited, because you don't usually have a static set of keys, and because the cost of generating the perfect hash is usually not worth it.

Gperf might be reasonable as a perfect hash generator for those incredibly rare situations when the extra work due to a hash collision is really the one thing standing between you and acceptable performance of your application.

I thought maybe we were seeing a bad writeup, but no, it's the authors' themselves who talk about the need for high-performance command-line processing, and give the performance of processing N arguments as O(N)*[N*O(1)]. I cannot conceive of a situation in which command-line processing is a bottleneck. And their use of O() notation is wrong (they are claiming O(N**2) -- which they really don't want to do, not least because it's wrong). O() notation shows how performance grows with input size. Unless they are worrying about thousands or millions of command-line arguments, O() notation in this context is just ludicrous.

I don't know why I'm going on at such length -- the extreme dumbness of this article just set me off.

Share
twitter facebook
- Re: (Score:2)
  
  by pclminion ( 145572 ) writes:
  
  Gperf might be reasonable as a perfect hash generator for those incredibly rare situations when the extra work due to a hash collision is really the one thing standing between you and acceptable performance of your application.
  
  The primary REAL use of gperf is generating keyword recognizers for language parsers. It's another tool in the same vein as lex and yacc.
- Re: (Score:2)
  
  by Hydrogenoid ( 410979 ) writes:
  
  O() notation shows how performance grows with input size.
  
  Really?
  I'd really like to see an algorithm whose performance grew with input size...
- - Re: (Score:2)
    
    by geophile ( 16995 ) writes:
    
    No, I get that, my point is that a hash table with collisions is probably just fine. Another responder mentioned the use of perfect hashes to avoid collisions when looking for reserved words in a parse. That application makes sense, but again, I really wonder if perfect hashing is worth the trouble. Does it really provide a noticeable performance improvement over an out-of-the-box hash table?
    - Re: (Score:2)
      
      by dkf ( 304284 ) writes:
      
      Does [perfect hashing] really provide a noticeable performance improvement over an out-of-the-box hash table?
      Yes, but only if you can pre-compute the hash function and pre-size the table right. That's really quite hard to do; the effort involved is such that it is usually easier to not bother. But if you've got a program that's going to do billions of hash lookups and the keys are well-behaved, it can be a worthwhile optimization.
      
      Strings (in English or any programming language) aren't generally well-behaved in the right sense though. Not unless your hash function is a crypto-hash, and that's typically quite a bit
      - Re: (Score:3, Interesting)
        
        by tqbf ( 59350 ) writes:
        
        I challenge: cite as an example any fixed set of strings (such as would be applicable for perfect hashing) for which a realistic perfect hashing scheme of any sort outperforms a statically-sized conventional chaining table using a trivial 33/37-style [google.com] string hash. I don't think you can. Gperf languishes in obscurity for a reason.
- - Re: (Score:2)
    
    by tqbf ( 59350 ) writes:
    
    Judy arrays are kind of silly [nothings.org], but I used to think tries were a great answer for parsing, because they provide O(m) abbreviation matching and access to ambiguous options. But then I realized: it's 1998 (hey, I'm old); why am I optimizing something that will run in individual milliseconds even if I search linearly?
Historically? (Score:4, Insightful)

by ClosedSource ( 238333 ) writes: on Sunday July 29, 2007 @01:07PM (#20032839)

"Command-line processing is historically one of the most ignored areas in software development."

This is like saying that walking is historically one of the most ignored areas in human transportation.

Share
twitter facebook
is this a joke? (Score:3, Insightful)

by oohshiny ( 998054 ) writes: on Sunday July 29, 2007 @01:14PM (#20032879)

If it's not, the author of that article should be kept as far away from writing software as possible; he epitomizes the attitude that so frequently gets C++ programmers into trouble.

Share
twitter facebook
- Re: (Score:3, Insightful)
  
  by turgid ( 580780 ) writes:
  
  Well, what do you expect from IBM? It's just another one of their look-Ethel-it's-open-source-and-look-at-us-helping -the-community content-free PR fluff pieces. Ignore them and they'll crawl back into their mainframe cave.
Is this a fucking joke? (Score:3, Funny)

by pclminion ( 145572 ) writes: on Sunday July 29, 2007 @01:21PM (#20032927)

Where's the Foot icon? Optimizing command line parsing? Oh God, my sides are splitting.

Share
twitter facebook
- Re: (Score:3, Insightful)
  
  by moosesocks ( 264553 ) writes:
  
  The weird bit is that, despite being a somewhat silly article, it launched one of the most intelligent discussions I've seen on /. in a while.
This is ridiculous (Score:2)

by Bluesman ( 104513 ) writes:

First of all, how many programs have command line parsing as a bottleneck?

Secondly, they should put this functionality into GCC instead, so that it creates a perfect hash for any large switch statement.
Another approach - parseargs (Score:3, Interesting)

by argent ( 18001 ) writes: <peter@slashdot . ... t a r o nga.com> on Sunday July 29, 2007 @02:29PM (#20033443) Homepage Journal

Something Eric Allman wrote many moons ago. I found it and modified it to support "native" command line syntax on MS-DOS, VMS, and AmigaDOS, and added some support for improved self-documentation... and then Brad Appleton saw it and rapidly enhanced it to support a plethora of shells and interfaces until it took up 10 posts in comp.sources.misc.

The following two directories should bring it up to the latest version I know of.

This is not efficient, mind you. Command line parsing doesn't generally need to be efficient, even by my miserly standards, honed when a PDP-11 was something you hoped to upgrade to... some day...

ftp://ftp.uu.net/usenet/comp.sources.misc/volume29 /parseargs/ [uu.net]
ftp://ftp.uu.net/usenet/comp.sources.misc/volume30 /parseargs/ [uu.net]

PARSEARGS extracted from Eric Allman's NIFTY UTILITY LIBRARY Created by Eric P. Allman <eric@Berkeley.EDU> Modified by Peter da Silva <peter@Ferranti.COM> Modified and Rewritten by Brad Appleton <brad@SSD.CSD.Harris.COM>

Brad's latest work in this area seems to be here:

http://www.cmcrossroads.com/bradapp/ftp/src/libs/C ++/CmdLine.html [cmcrossroads.com]

http://www.cmcrossroads.com/bradapp/ftp/src/libs/C ++/Options.html [cmcrossroads.com]

Share
twitter facebook
- Boost.Program_Options? (Score:2, Informative)
  
  by nahpets77 ( 866127 ) writes:
  
  What about Boost.Program_Options [boost.org]? I thought I'd see a post on it here somewhere, but not one person has mentioned it (yet).
  A few months ago, I was looking around for a C++ library for parsing command line options. I checked out get_opt and I thought that there must be something that uses std::string instead of char*. After some googling, I found Boost.Program_Options seemed to be exactly what I was looking for. It supports long and short options (-s,--short) and I was able to start using it quite eas
  - Re: (Score:2)
    
    by abdulla ( 523920 ) writes:
    
    Boost.Program_Options has some odd problems with the GCC visibility flags that cause it to return invalid values. However, after wrapping the header with visibility pragmas it works, and it works well with my needs. I needed a library that would allow me to specify a library on the command line, load that library and add possibly more command line options, then continue processing all other arguments. However PO is rather bloated compared to other options available, but at least it isn't leaking memory like
Silly (Score:2)

by m.dillon ( 147925 ) writes:

This is kinda silly. If you only have a few keywords you don't need anything sophisticated. If you have more then a few but not more then a few dozen its usually easiest just to arrange them in a linear array and do an index lookup based on the first character to find the starting point for your scan. More then that and you will want to hash them or arrange them in some sort of topology such as a red-black tree.

Generally speaking hashes are very cpu and cache-inefficient beasts, especially if one can rea
And it's a gpl tool (Score:2)

by Suicyco ( 88284 ) writes:

Which means that using at the command line is "linking" it. Doing so, of course, means your upstream code must be GPL as well. Ad Infinitum. Sorry, but the bulk of c/c++ code out there is non-gpl licensed and therefor can take no advantage of tools such as this.
This tool is much easier (Score:3, Interesting)

by stupendou ( 466135 ) writes: on Sunday July 29, 2007 @04:35PM (#20034467)

Try supergetopt instead. Much easier to use and also open source.
http://www.ibiblio.org/pub/Linux/devel/sugerget-1. 1.tgz [ibiblio.org]

With this code, you simply specify command-line strings and variables in a printf()
style format.

E.g. supergetopt( argc, argv,
"string1", "%d %d", function1,
"string2", "%s", function2 )

will call function1( int a, int b ) when string1 is on the command line,
and will call function2( char *s ) when string2 is used on the command line.

A whole lot easier than gperf, IMHO.

Share
twitter facebook
- Re:C++ I get (Score:4, Insightful)
  
  by Anonymous Coward writes: on Sunday July 29, 2007 @12:01PM (#20032393)
  
  I do. On MIPS, ARM, PPC, x86, and all the other embedded stuff. I don't think C will ever die - it's the universal assembler language.
  
  Parent Share
  twitter facebook
- Re:C++ I get (Score:5, Funny)
  
  by V. Mole ( 9567 ) writes: on Sunday July 29, 2007 @12:04PM (#20032419) Homepage
  
  There's this little project of which you may have heard: http://www.kernel.org/ [kernel.org]
  
  Parent Share
  twitter facebook
  - - Re: (Score:2)
      
      by Sancho ( 17056 ) writes:
      
      Interestingly, C is no longer a proper subset of C++.
- Re: (Score:3, Insightful)
  
  by iangoldby ( 552781 ) writes:
  
  I use C for any low-level programming project that doesn't warrent an object-oriented approach.
  
  The trick is to identify the best tool for the job.
- I'm doing it. (Score:2)
  
  by www.sorehands.com ( 142825 ) writes:
  
  I'm currently rewriting Post Road Mailer, which is in C on OS/2. I also wrote a e-mail scanner. It all depends on what you need to do.
  
  I did a phone interview for a job a couple of years ago. Remote underwater sensor equiptment. Had to run on battery, you think they would have written in in C or C++? It would once in a while turn on the hard drive one the flash drive was full.
  
  The more you abstract something, the less efficient it becomes.
  
  There are millions of lines of COBOL code still running.
  
  "The Jenolan c
  - Re: (Score:2)
    
    by DreadSpoon ( 653424 ) writes:
    
    The more you abstract something, the less efficient it becomes.
    This is not at all true, especially not today. I'd trust an abstract container library to optimize its internals far more than I'd trust you or almost any other individual developer to do the same.
    
    I trust my C compiler to get the vary many high-level optimizations required by today's CPUs right than I'd trust you or almost any other individual developer to do the same.
    
    Yeah, sometimes those high level libraries or languages get things wrong, but that's not a given just because they're more abstract. It's
    - I disagree (Score:2)
      
      by www.sorehands.com ( 142825 ) writes:
      
      While I agree that most modern compilers can out optimize the average programmer, you are still looking at generalities.
      
      Both compilers and abstract container class have to deal with generalities which may not apply to YOUR specific case. The class writer does not know the specific case or conditions (presuming you are not writing the class for that specific condition). A class writer has to (or should be) check arguments and conditions, where if you know it has been checked (and am damn well sure) you can s
  - Re: (Score:2)
    
    by WuphonsReach ( 684551 ) writes:
    
    I'm currently rewriting Post Road Mailer, which is in C on OS/2.
    
    I used that mailer! I think I still even have a bunch of e-mails still in that format.
- Re: (Score:2)
  
  by AuMatar ( 183847 ) writes:
  
  Pretty much every embedded program in existence. Own a printer? Thats several hundred thousand lines of C in there.
- Don't do any embedded development, do ya? (Score:2)
  
  by Anonymous Meoward ( 665631 ) writes:
  
  In the embedded realm (not to mention kernel or driver space stuff for any OS), you won't be using much C++. Granted, I've used both in the embedded world, and I prefer C++ whenever I can get away with it. But that ain't often.
  One of the problems with C++ in the embedded market is not the language itself, but the mindset of the developers. Most folks who do low-level stuff are not as concerned with code structure and organization as they are the size and speed of the generated code. (Don't believe that?
- - Re:C++ I get (Score:5, Interesting)
    
    by mce ( 509 ) writes: on Sunday July 29, 2007 @12:25PM (#20032567) Homepage Journal
    
    You, whenever you compile C++ code, as it is compiled to C before machine code (unless you are using an exotic compiler such as the Compaq AXP C++ compiler for TRU64).
    
    Excuse me???? That was not even true anymore when I started using C++, back in 1992. There are features in the C++ standard that are so extremely difficult to correctly implement in standard compliant C that it's a complete waste of effort trying to pass via C while compiling. Exception handling comes to mind as the prime example. A failed attempt to support exceptions was the reason why Cfront 4.0 was abandoned. Note that 3.0 was released as early as 1991. The last Cfront based compiler I had the horor of using was HP's CC. It was superseeded by the new native aCC by 1994 at the latest.
    
    By the way, I used to write C/C++ compilation/optimisation stuff for a living, so I guess I know something about the topic.... :-)
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by hxnwix ( 652290 ) writes:
      
      I used to write C/C++ compilation/optimisation stuff for a living, so I guess I know something about the topic....
      Good guess. [wikipedia.org] Name decoration and limited knowledge of c++'s origins led me to conclude that most C++ compilers still act as front ends. So, we don't all use C anymore...
    - Re: (Score:2)
      
      by pclminion ( 145572 ) writes:
      
      There are features in the C++ standard that are so extremely difficult to correctly implement in standard compliant C that it's a complete waste of effort trying to pass via C while compiling.
      
      The only thing I can imagine that would be hard to map directly onto C would be exceptions. Can you confirm that this is what you mean? Because nothing else comes to mind that would be "extremely difficult" to implement.
      Even then, it's possible to emulate C++-style exceptions in C. I've done it -- the best descri
      - Re:C++ I get (Score:4, Informative)
        
        by mce ( 509 ) writes: on Sunday July 29, 2007 @03:24PM (#20033809) Homepage Journal
        
        Of course C++ exceptions are what I meant. What else would I mean when using the word "exceptions" in this context?
        And yes, C++ exceptions can be expressed in C. After all, C is a glorified assembler and the resulting code from C++ translation is assembler as well. It all depends in the level of abstraction at which write the C code is written and on the amount of uglyness/inefficiency you're willing to take on board (and also the trade-off between both of the latter). But that's not the point. The point of this thread is that nowadays it makes no sense to make use of this capability in a C++ compiler. Especially not when considering that a user of a C++ compiler wants more than just a compiler. He also wants a debugger that is able to meaningfully link up the binary and the original C++ source. If you're a C++ compiler vendor, using C as an IL does nothing but complicate your own life. Twice.
        
        Parent Share
        twitter facebook
      - Re: (Score:2)
        
        by bytesex ( 112972 ) writes:
        
        When you have 'goto', and 'return', what's so difficult about implementing exceptions in vanilla C ? Even in APIs - you just 'goto' some point that sets a flag and returns, and the 'trying' API-using caller checks the flag upon return. If the namespaces don't clash, it's not a problem at all !
        
        Re:C++ I get (Score:4, Informative)
        
        by mce ( 509 ) writes: on Sunday July 29, 2007 @05:58PM (#20035175) Homepage Journal
        
        The main problem (but not the only one) is called "object destructors". You have to make sure they are called. All of them, and in the correct order, at all the nested scopes of execution you are in when the exception occurs. And you need to make sure not to call them on any object not yet constructed (always remember that constructors can throw exceptions too) and never to call a destructor twice (I've seen this kind of bug multiple times in multiple compilers). And then there is the fun of exceptions thrown by destructors, not to mention the possibility that it all happens in the middle of constructing or destructing an array of objects.
        
        All that is why setjmp()/longjmp(), also known as C's non-local goto, don't cut it, which in turn means that you need to complicate function return mechanisms. And just when you think you got that problem sorted out, you need to be aware that C++ functions can call (library) C functions that were never compiled to even know about exceptions but that in turn can call C++ functions that may again throw an exception. The entire construction needs to be able to handle this.
        
        As I wrote in an other post [slashdot.org] in this thread, it can be done. But it is not easy. Note that the entire object destructor issue also applies within a single scope, which is why life is not as easy as replacing every "throw" statement by "goto end;".
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by bytesex ( 112972 ) writes:
        
        Ok, but at that point you have already called their constructors, so you have a list. Just working down that same list calling all the destructors doesn't seem that much of a problem.
  - Re:C++ I get (Score:5, Informative)
    
    by Enselic ( 933809 ) writes: on Sunday July 29, 2007 @12:33PM (#20032619) Homepage
    
    You are wrong about 3):
    
    The process of building the new engine went much more smoothly than anything we have done before, because I was able to do all the groundwork while the rest of the company worked on TeamArena. By the time they were ready to work on it, things were basically functional. I did most of the early development work with a gutted version of Quake 3, which let me write a brand new renderer without having to rewrite file access code, console code, and all the other subsystems that make up a game. After the renderer was functional and the other programmers came off of TA and Wolf, the rest of the codebase got rewritten. Especially after our move to C++, there is very little code remaining from the Q3 codebase at this point.
    
    Source: http://archive.gamespy.com/e32002/pc/carmack/ [gamespy.com]
    
    And 4) as well:
    
    Historically, compilers for many languages, including C++ and Fortran, have been implemented as "preprocessors" which emit another high level language such as C. None of the compilers included in GCC are implemented this way; they all generate machine code directly. This sort of preprocessor should not be confused with the C preprocessor, which is an integral feature of the C, C++, Objective-C and Objective-C++ languages.
    
    Source: http://gcc.gnu.org/onlinedocs/gcc-4.2.1/gcc/G_002b _002b-and-GCC.html [gnu.org]
    
    Parent Share
    twitter facebook
  - Re: (Score:2)
    
    by mechsoph ( 716782 ) writes:
    
    You, whenever you compile C++ code, as it is compiled to C before machine code (unless you are using an exotic compiler such as the Compaq AXP C++ compiler for TRU64)
    
    GCC parses C++ to it's tree IR; there is no translation to C.
  - Wrong about 4 (or at least, very out of date) (Score:2)
    
    by jdennett ( 157516 ) writes:
    
    It's been many years since most C++ compilers used C as an intermediate language. CFront did, and some EDG-based compilers do, but most current C++ compilers do not.
    
    C does have its strengths, such as the relative simplicity of C90 and its lack of dependency on sophisticated compilers and runtimes, but its use as an IL is largely historical.
  - Re: (Score:2)
    
    by PerlDudeXL ( 456021 ) writes:
    
    You, whenever you compile C++ code, as it is compiled to C before machine code
    
    One of my Computer Science Profs said something similar. He argued that C and C++
    are basically the same outdated shit and professionals would only use Java in real-world
    applications. The best thing: He ran Ubuntu and all sorts of Gnome stuff on his Laptop.
- Re:Joke? (Score:5, Insightful)
  
  by iangoldby ( 552781 ) writes: on Sunday July 29, 2007 @12:20PM (#20032539) Homepage
  
  Someone found a "new" toy?
  Well I for one won't be using this to process command-line arguments (that's what getopt() and getopt_long() are for), but it is certainly useful to know of a tool that I can use to generate a perfect hash. The next time I need some simple but efficient code to quickly discriminate between a fixed set of strings, I'll know to Google for gperf. (Before I read this article I didn't even know it existed.)
  
  Parent Share
  twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Speed in options parsing? (Score:5, Insightful)

Re:Speed in options parsing? (Score:4, Insightful)

Re: (Score:3, Informative)

Re: (Score:3, Insightful)

Re: (Score:2, Insightful)

Re: (Score:3, Funny)

Re:Speed in options parsing? (Score:4, Informative)

Re:Speed in options parsing? (Score:5, Informative)

Broken handling of vtables in linkers (Score:5, Informative)

only relevent to static linking (Score:5, Informative)

Which platform uses dynamic libstdc++? (Score:3, Insightful)

All the world is not a PC (Score:5, Insightful)

Re: (Score:2)

devkitARM (Score:3, Informative)

Re: (Score:2)

Byte counts when compiled with devkitARM (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Byte counts when compiled with devkitARM (Score:4, Funny)

Character encoding conversion (Score:3, Informative)

Re: (Score:2, Funny)

Re: (Score:2, Informative)

Re: (Score:2)

Re:Speed in options parsing? (Score:5, Funny)

Re:Speed in options parsing? (Score:4, Funny)

Re: (Score:2)

Re:Speed in options parsing? (Score:4, Insightful)

Re: (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2, Insightful)

Re:Speed in options parsing? (Score:4, Funny)

Re: (Score:3)

Re: (Score:3, Interesting)

Re: (Score:2)

Re: (Score:3, Insightful)

Re:Speed in options parsing? (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Too much (Score:4, Insightful)

Re:Too much (Score:5, Insightful)

Yeah, because getopt(3) is a real bottleneck (Score:5, Insightful)

It is if the linker complains about not finding it (Score:5, Informative)

Re: (Score:3, Interesting)

Re: (Score:2)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:3, Informative)

Re: (Score:2)

Re:It is if the linker complains about not finding (Score:3, Insightful)

Re: (Score:2, Informative)

And the standard says... (Score:5, Insightful)

I agree... (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Informative)

Correction... (Score:2, Insightful)

Re: (Score:2)

Wrong in so many ways (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:2)

Historically? (Score:4, Insightful)

is this a joke? (Score:3, Insightful)

Re: (Score:3, Insightful)

Is this a fucking joke? (Score:3, Funny)

Re: (Score:3, Insightful)

This is ridiculous (Score:2)

Another approach - parseargs (Score:3, Interesting)

Boost.Program_Options? (Score:2, Informative)

Re: (Score:2)

Silly (Score:2)

And it's a gpl tool (Score:2)

This tool is much easier (Score:3, Interesting)

Re:C++ I get (Score:4, Insightful)

Re:C++ I get (Score:5, Funny)