Programmer's Language-Aware Spell Checker? 452
Jerry Asher writes "Not all of my coworkers are careful about spelling errors. Sometimes this causes real embarrassment as spelling errors creep into software interfaces. Does anyone know of spell checkers for programming languages? I don't want a text spell checker, I want a programming-language-aware spell checker. A spell checker that I can pass all of my code through and will flag spelling errors in function names, variable names, and comments, but will ignore language keywords, language constructs and expressions, and various programming styles (camel code, or underscores, or...). I want a spell checker that knows that void *functionSigniture(char *myRoutine) contains one spelling error. Does anyone have such a thing for Java or C++? Are there any Eclipse plugins that do this?"
Eclipse WTP 3.3 Europa seems to do this.. almost. (Score:5, Informative)
Man Dies Waiting for Eclipse to Launch (Score:5, Funny)
A software engineer in San Jose, CA was found dead at his desk yesterday, apparently having died while waiting for his Java editing program, Eclipse, to finish its boot process. Coworkers say the engineer came in that morning vowing to "get Eclipse working on his box or die trying." The last thing anyone heard him say aloud was the cryptic comment: "I see the splash screen is appropriately blue." Nobody knows what he meant. The man was then thought to have fallen asleep, but hours later it was discovered that the engineer had died suddenly of apparent natural causes. The forensics team's investigation that evening was reportedly interrupted unexpectedly when the dead man's Eclipse program suddenly finished launching. The team tried to interact with it to see if they could find clues about the man's death, but the program was unresponsive and the machine ultimately had to be rebooted. At this time, the police commissioner says there is no evidence of foul play, and they currently believe the man simply died of either boredom or frustration.
Re:Man Dies Waiting for Eclipse to Launch (Score:5, Informative)
Re:Man Dies Waiting for Eclipse to Launch (Score:4, Interesting)
Work in a windows environment in Virginia. Access the Eclipse workspace directory through a mounted drive pointing to your home directory on a UNIX box in Montana. On the UNIX machine, your home directory is actually mounted on a Windows box back in Virginia.
God help you if you have the "compile on save" option enabled. And don't even THINK of rebuilding the workspace.
And yes, I know this from experience.
Re:Eclipse WTP 3.3 Europa seems to do this.. almos (Score:3, Insightful)
Re: (Score:3, Informative)
The idea isn't anywhere near as nuts as you think it is, provided you make a habit of using meaningful variable/class names.
Re:Eclipse WTP 3.3 Europa seems to do this.. almos (Score:5, Funny)
Re: (Score:2)
ego != good_open_minded_programmer (Score:5, Insightful)
You clearly fail to see a programmer can also create their own function names, as well as use other peoples functions. So you prove you are a very inexperienced programmer, (and close minded), which adds weight to the idea you are either young or just arrogant. Also your very apparent need to show hostility, shows a degree of insecurity, where you are over compensating, by verbally hitting out at others, in an attempt to appear to be more knowledgeable than you really are.
The easiest way to become a better programmer, is to be more open minded. So far you have failed to demonstrate this.
As a side note, (back in the DOS days of programming), I found the the spell checker in Multiedit very useful (especially when having to work very late at night, after the coffee stopped working!
Re:ego != good_open_minded_programmer (Score:5, Funny)
We're the do-anything team that specialises in imaginging new ways for you to reach your audience.
The word "pwned" doesn't spell check correctly either, but it is applicable.
Re: (Score:3, Interesting)
"Misspelt" is a legitimate spelling in British English. It's in the OED, with examples from 1762 to 1990.
Since I have just corrected you, I assume I have made an error somewhere in this post, though I haven't managed to find it.
Re:What the fuck is the OP on? (Score:5, Funny)
Re: (Score:3, Informative)
Re: (Score:2, Informative)
Re: (Score:3, Interesting)
The former can be done by a simple regexp, the latter... you can do a LALR parser, but why even bother? Just look for _any_ potential identifier; in most languages, that's [a-zA-Z_][a-zA-Z_0-9]+; and simply add the few keywords which are not English words to your dictionary. In fact, this would be nearly programming language agnostic.
When it come
Re: (Score:2)
Re:Eclipse WTP 3.3 Europa seems to do this.. almos (Score:5, Funny)
"You appear to be creating an infinite loop. Would you like me to increment your counter variable?"
"You appear to be writing a virus, would you like a list of the latest Windows Vista sploits?"
Re: (Score:3, Informative)
2. There are some practical ways to construct proofs that a loop ends (remember the CS lectures). Sure, it's not a perfect solution, but if you can't construct a proof that the loop ends, you'd better rethink the loop, and possibly rewrite it.
Re: (Score:3, Informative)
Its impossible for a computer program to be constructed which can do so for all cases (hence, the halting problem), but that doesn't mean that its impossible to detect some infinite loops, or to detect constructs which are particularly likely to be infinite loops, either of which could, in theory, be useful features in an IDE.
Spelling/grammar checkers for human language aren't flawless, either, but they still have uti
Visual Assist (Score:3, Informative)
Next silly question, please.
Re:Visual Assist (Score:5, Funny)
Re:Visual Assist (Score:4, Funny)
Re: (Score:3, Informative)
Re: (Score:3, Informative)
Just FWIW, it checks typing in both comments and (perhaps more importantly) string literals. It's also "intelligent" enough to know (for example) that '%d' should not be treated as a problem in a string literal. It is true, however, that symbols that are misspelled don't get highlighted, provided the misspelling is consistent.
How about eyeball Mk 1? (Score:5, Funny)
Re:How about eyeball Mk 1? (Score:5, Insightful)
Re:How about eyeball Mk 1? (Score:5, Funny)
Re:How about eyeball Mk 1? (Score:5, Insightful)
Re: (Score:2)
May I suggest.... (Score:2, Funny)
Re:May I suggest.... (Score:5, Insightful)
Responses like this entirely miss the point of the question. Same with the "just review your code" responses. It's not a matter of making the language more readable. It's a matter of making the code more usable. Certainly, correct spelling is pointless without other elements of good code practice. However, bad spelling can add a lot of frustration.
I joined a project which already had a few misspelled class names. I'm a fast typer and often I've typed out more of a filename than is spelled correctly before hitting tab to complete the name. Needless to say, I've been trained to hit tab earlier for a few choice files. But it's certainly been an irritation. Similarly, I've been confounded more than once when a function or variable couldn't be found by the compiler, only to realize that I'd spelled a word correctly rather than how the actual name was spelled.
We choose to use English words for our class, function, and variable names for a reason. That reason is mostly defeated by misspelling the English word. A dictionary is a great idea, even for coding languages that don't "read like English".
Re:May I suggest.... (Score:4, Insightful)
Re: (Score:2)
Re: (Score:2)
eror: 312 varible naim mispelled
Re: (Score:2)
Incorrect spelling in code causes all sorts of minor confusion - I'd love an Eclipse plugin to address this.
Re:May I suggest.... (Score:4, Insightful)
It strikes me that the problem is that most spell checkers try to check everything, and that a lot of code has things that really shouldn't be spell checked at all, mixed with things that should. I imagine that one way to start would be to only alert on those errors that are almost correct -- if it looks like garbage, ignore it, but if it's close, assume it should be right. Perhaps ignore prefixes / suffixes as well -- pSomething is fine, pSometihng isn't. Also, CamelCase ought to be easy enough to detect -- treat it as word boundaries, and spell check the individual words. Again, egregious misspellings probably aren't -- nextObjFoo is ok, even though Obj isn't a word -- it's so far from being a word that we assume the programmer meant it that way.
Similarly, there should probably be a set of words added that aren't "English" but are used often enough to be worth adding to the dictionary. Things like Obj, Int, and Ptr.
I think the reason such spell checkers don't exist already is fairly simple -- everyone just assumes they're impossible, and doesn't try. Couple that with the fact that a mediocre quality one would be so annoying as to be worse than useless, and you have a recipe for a program that won't get written. I don't think either of those would have to be the case if someone sufficiently clever decided to attack the problem, though.
Re: (Score:2)
Re: (Score:2)
Why don't you right click on said function and choose Refactor -> Rename, or whatever the equivalent is in your IDE?
Sounds like a good idea (Score:5, Interesting)
Re: (Score:2)
possibly even do cross convention linking.
I now have a new uni-project i care about
maybe i should implement it in emacs.
Re: (Score:2)
Keep text in dedicated files (Score:2, Insightful)
Re: (Score:2)
That said, I rarely see this as a big problem unless it's a very static internal interface. Pull the whole code tree
Re: (Score:3, Informative)
Overall, the answers to the submitters question are absolutely horrible so far. If the tool he's searching for doesn't exist, it damn well should.
Re: (Score:2, Redundant)
vim 7.0 anyone? (Score:3, Insightful)
Re: (Score:2)
should be fairly simple to implement (Score:2, Interesting)
It's a good question ... (Score:5, Interesting)
People here making fun of his request and saying that this should be set in stone in design documents, or be checked in peer code reviews are obviously not working in a run-of-the-mill software company where there's neither the inclination nor the time to do everything the formal way. Also, I have to see the first design document that correctly enumerates all the requirements for the software, let alone all the names for the variables to be used.
Re:It's a good question ... (Score:5, Informative)
As a non-native English speaker, working in a non-native english speaking team (mainly french speaking people) it is a real problem. The biggest problem happens when you search something and don't find it because you wrote it right and your coworker wrote it wrong. (Or the inverse, I don't claim to be perfect in English)
Sure, you might say, "Write your code in French", but that's not a solution. My mother tongue is Dutch, we have a German coworker, and you never know if the next guy will be Italian. There is also this team that has to maintain code written by Spanish people.... in Spanish.... and they don't know Spanish. Fun times, if you like to hear them curse....
In multilingual environments this problem increases drastically.
how soon do you need it? (Score:2)
I am amused by the idea of being able to extend that to programming languages.
The most significant problem that I am facing has nothing to do with coding the spell-checker. Its about getting a sizable dictionary of words (finding one, converting to UTF-8 etc.)
The trouble is that programming comes with a very different set of
Re: (Score:2)
Just appoint one "spelling guru" who is the only one allowed to edit the list and add new words to it as you go. It's probably better to strictly manage the list anyway, considering the various number of ways some words can be written, i.e. color vs. colour. You'd probably want only one of those variants allowed.
You could probably use a simple lexer to detect function and variable names and (customizable) regex to extract the c
Re: (Score:2)
exactly when did i say that i code in windows (and if i do, so what?)
and are you upset that java is not being released in the BSD license?
I never said that I could not find a dictionary of english words. if you had read my post, you would know that I talked of dictionaries in multiple languages. Of all the languages in the world, English is just one of them. It might help to remember that much.
FxCop (Score:2, Informative)
The $$$ version of Visual Studio (the Team Suite version) comes with an introspection engine for VC++ though, it's not as flexible as FxCop but does the basics.
Then there's the countless "Spellchecker" plugins avail
TextMate does some... (Score:3, Interesting)
You can right-click on any "word" (variable name, subroutine name, whatever, just generally a whitespace-delimited group of characters) and it will check the spelling and present alternatives in the context menu. It also recognizes things like perl's sigils so correcting '$teh' turns into '$the', not 'the'.
It _won't_ automatically check spelling except in strings (so e.g. if I have '$teh = "This is a tset.";', 'tset' will be underlined, '$teh' won't). It doesn't include comments in its automatic checking either, which is probably the most annoying part about it.
Overall I typically just don't bother with it, but someone _has_ thought along these lines, at least.
Can it also spelll check slashot polls? (Score:2)
Re: (Score:2)
aspell? (Score:4, Insightful)
the problem is really prevalent (Score:2)
Re:the problem is really prevalent (Score:5, Funny)
No one's offering solutions... (Score:2, Interesting)
I assume a compiler will parse the source and in the process identify which tokens are key words and literals, and which are programmer-defined identifiers in the code. The spell checker would either use the same algorithm, or latch into that part of the algorithm to get at all of the identifiers. There are two possible word separators in typical code--either capital letters or u
Tags (Score:2)
I tagged this article badlytagged
How about this (Score:5, Interesting)
Enough rant. How about this:
perl -ne "s/([a-z])([A-Z])/$1 $2/g; tr/A-Za-z/
That will give a list of unique words in your source code (use find and xargs to scan the whole source tree). Then you can run that list of words through an ordinary spellchecker such as ispell. Unfortunately when you find a mistake you have to go back and grep for it to find where it occurs. You would also need a personal dictionary for things that are not English words but nonetheless appear in code.
I would probably keep the private word list containing things like 'foreach' and 'const' with the program source code, and have a makefile target 'make spellcheck' that runs a command like the above and then prints out all words found that are not in
find . -type f -name '*.c' | xargs perl -ne "s/([a-z])([A-Z])/$1 $2/g; tr/A-Za-z/
sort -u private_word_list
diff -u allowed_words found_words | grep -E '^[+][^+]'
The private word list can be kept under version control and checked in whenever you add a new non-English word like 'Frobule' to your source code.
Adding filenames and line numbers to the output is left as an exercise for the reader. You might also want to change the perl command to ignore words with length < 5.
Re: (Score:3)
I suspect that they actually decided that TextLikeThis was easier to type, and sufficiently readable that the typing ease benefit was worth the switch. Of course that's because no one thought of making shift-<space> map to _.
FxCop (Score:5, Informative)
http://www.gotdotnet.com/Team/FxCop/ [gotdotnet.com]
Visual Assist (Score:3, Interesting)
I'm not associated with Whole Tomato, but if anyone from WT sees this, can I have a free subscription
Re: (Score:2)
Re: (Score:3, Interesting)
Comments like this make me wonder. Is it so hard to imagine a spelling checker for say the C language that finds words that were not written the way they were intended? Limiting yourself to correct English words for identifiers is stupid. Assuming that a spelling checker for a programming language would do
It's not your job (Score:2)
Nothing personal, but it's not actually a programmer's job to make sure everything is speelled correktly. This is part of the QA process before a product rolls out the door. Sure, you should do your best, but you need another pair of eyes (or several pairs of eyes) looking at the UI in addition to your own. You can easily miss the forest for the trees.
FxCop (Score:2, Redundant)
Create a dictionary for your project (Score:2, Interesting)
I used aspell and went through huge parts of the source, telling it what wasn't misspelled. It was an incredible pain in the neck because it got confused over all the variable names, bits of C syntax etc etc.
Once I had a dictionary, though, I could recheck the source periodically and although there were a lot of false warnings, we
Annoying perhaps but (Score:5, Interesting)
HRESULT MFGetService(
IUnknown* punkObject,
REFGUID guidService,
REFIID riid,
LPVOID* ppvObject
);
You'll probably just end up spending all your day removing false positives.
Re: (Score:3, Insightful)
Ken Thompson and creat() (Score:5, Interesting)
"If I had to do it over again? Hmm... I guess I'd spell 'creat' with an 'e'."
Re: (Score:3, Interesting)
Now, in essentially every program in the world there is a function named 'create_something' or alternatively 'createSomething'. Had Ken Thompson's creat() function been spelled create(), early C compilers would have treated them the same way, thus making any function starting with
Emacs - ispell-check-comments (Score:3, Informative)
I'm sorry... (Score:3, Interesting)
I don't think I'd like to hire someone who can't spell. It shows volumes about you.
Intelligence starts with a keen understanding and application of your language.
if you simply must have it, editplus has syntax highlighting and offers spellchecking dictionaries.
English (or $YOUR_LANGUAGE_HERE) (Score:3, Insightful)
At least in the real writing business there are editors trained and paid to catch these errors.
Being unable to spell correctly makes you look really stupid to most people.
Just FYI, if you have a decent programming environment, it should at least flag cases where you've mistyped an existing identifier. If there's an ImmediateFlag in your code, you'd get a warning if you typed ImediateFlag or ImmediateFalg or whatever. Not much help when the programmer is creating new identifiers, of course. Although I've seen cases where the programmer in question for whatever reason decided that because ImediateFlag was undefined then they would just define it, even though ImmediateFlag existed and was what they meant. That ought to get you fired in my book.
Hey by the way, pair programming is a great way to have continuous code reviews and a check on some of the more typical fumble-finger errors.
Re:How about the Built-in OS X spell checker? (Score:5, Funny)
We're talking about programming, friend.
Re: (Score:2)
Re: (Score:2)
Re:Syntax Highlighting (Score:4, Insightful)
Re: (Score:2)
Re:simple (Score:5, Insightful)
Re: (Score:3, Insightful)
To the original question: is strncpy misspelled? What about foo? sqrt? exp? Impl? Programese has an interesting linguistic history and its lexicon contains much not found in English.
While misspelled variable and function names are annoying, a refactor tool and a compile make them relatively painless. Perhaps the best approach would be to take your API documentation, run a script to split CamelCase an
Re: (Score:2)
Yeah, that's great and all, but this is something you can only do for small algorithms. Proving correctness of large systems takes huge amounts of time and resources, so for most developers this is just not feasible. Also a while ago, someone notable in the Java world (Gafter? Bloch? Can't seem to find the link) blogged about how he had discovered that one of the core textbook examples taught t
Re: (Score:2)
Found it... Extra, Extra - Read All About It: Nearly All Binary Searches and Mergesorts are Broken [blogspot.com]. I like his final paragraph:
"We programmers need all the help we can get, and we should never assume otherwise. Careful design is great. Testing is great. For
Re: (Score:2)
The bug is a consequence of integer overflow. Which is a perfectly common type of bug in languages where "int" and thelike overflow. But really, in most cases it's insane to expose messy internal details of the representation and computer to the programmer.
Yes, sometimes you need to. But most of the time you should be using a language and/or library that encapsulates details like that so that you can say "some integer" and have the language and/or library take care of whatev
Re: (Score:2)
Re: (Score:2)
the key skills of programming, spelling and pretty much any other ability you'll ever attempt to acquire.
In English enumerations, you have a comma before the last and. You should have written:
programming, spelling, and pretty much any other ability
Oh beautiful irony. Now go to a library and pick up a grammar book. Or maybe, just maybe, realize that everyone isn't perfect and that spelling isn't viewed as really important by that many, and that a spellchecker is a really useful tool.
You can have the benefit of the doubt... (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
*searches for the delete button*
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
"Ya" is clearly intentional and comes from a dialect, so that's ok. I'm not one either, so this means I get to shout at you, right?
Re: (Score:2)
C++ requires that a parameter passed to a function have a different name from the same class member.
I don't understand this - are you saying you can't have
Re: (Score:3, Funny)
Re: (Score:3, Informative)
(require 'flyspell)
(require 'cc-subword)
(defvar ps-flyspell-check-subwords nil
"*Non-nil if Flyspell should check subwords separately.
If this variable is set to non-nil, an identifier such
MyLongFunctionName will be treated as four separate words (My,
Long, Function, Name) for the purposes of Flyspell.")
(defadvice flyspell-region (around subword-checking (beg end))
"Check individual subwords if ps-flyspell-check-subwords is set."
(i