Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

The Definitive ANTLR Reference

samzenpus posted more than 6 years ago | from the read-all-about-it dept.

Book Reviews 95

Joe Kauzlarich writes "Finally, someone has done us all the great service of publishing a book about the second most well-known compiler compiler, Terence Parr's Antlr, and it was written, moreover, by Parr himself and published as part of the somewhat-usually-reliable Pragmatic Bookshelf series. Take note, while it requires a JVM to run, Antlr is not just for Java developers; it generates compilers in Python, Ruby, C, C++, C# and Objective-C. Also note that this book is more than just an elaborated man-page; it is also an excellent introduction to the concepts of compiler and parser design." Keep reading for the rest of Joe's review.First off, I have no preference between Yacc-style parsers, JavaCC and Antlr; I've never used Yacc, have used JavaCC in college and have since played with Antlr and am just as ignorant in the use of them all. The fundamental difference is that Antlr is a top-down LL(*) (simply-put, variable-lookahead) parser generator while Yacc is a bottom-up LR parser generator. JavaCC is also top-down, but employs a different parsing strategy. The book describes the meanings of these terms in simple detail.

I happen to have learned in my experience that good documentation for any of these products is hard to come by and difficult to follow, simply because the subject matter is obtuse and few, until now, have ventured to write expository literature to explain the myriad concepts to the non-academician. Of the three mentioned above, Antlr appears to be the more 'modern' and can also generate lexers from within the same grammar definition file, so the notions are integrated. Antlr also has a useful IDE called AntlrWorks with visualization features, causing grammar construction to be far simpler for a beginner.

That said, I don't wish to use this review to push Antlr over its alternatives, but only to press the point that this book serves not only to introduce Antlr to the average programmer, but the concepts of parser design as well. The concepts become necessary to understand while writing and debugging grammars, as not everything written in Backus-Naur Form will produce a working parser, and this holds true for any parser generator. Learning what works and what doesn't, as well as what workarounds are available, is key to becoming proficient in Antlr, Yacc or JavaCC. Once proficiency is acheived, you'll have the valuable skill of producing domain-specific languages on demand.

Terence Parr, as mentioned before, is not only the author and maintainer of Antlr, but he wrote the book as well. Antlr is on its long-awaited third version and has been maintained by Parr throughout the project's lifetime. He is a university professor and himself developed the path-breaking LL(*) parsing strategy employed by Antlr.

Parr begins with a one chapter background in computer language design before diving into a simple example of a parser for basic integer expressions. Part II is the meat of the book, describing various aspects of writing grammars for Antlr. Generally speaking, he covers the basic semantics of grammar writing, the many optimization, supplementary and 'workaround' options provided by Antlr, grammar actions and attributes, syntax trees, error reporting and related practical topics.

The third part, Understanding Predicated LL(*) Grammars, is the valuable 'textbook' portion of the book. It gives readers a short and comprehensible introduction to exactly what predicated-LL(*) means as well as a look at how competing parser generators work in contrast.

Both of the second and third parts are scattered with theoretical tidbits to help language designers better understand why grammars must work as they do. Those who can't pick their nose without a rudimentary theoretical overview of the subject can enjoy a few casual browsings through the book before even sitting in front of a computer. It works *almost* that well as a textbook, though it still doesn't approach such classics as Aho, et al's, Compilers: Principles, Techniques, and Tools (if you want to get seriously involved in compiler design). Take it for what it is though, as a chance to learn a tool of possible value without having to dig through old mailing lists and last-minute README's on the one hand, as was much the case a year ago, and on the other hand, devoting pain-staking class and study time to a lot of theory you won't find of practical value.

So I'll recommend this book on the basis that there's nothing else like it available; and don't wait until a project comes along that requires knowledge of compiler design, because there's a heck of a learning curve (I'm still on the very low end and I wrote a compiler in college). If you think compiler or parser design is interesting or may conceivably write a domain-specific language for your workplace, the Definitive Antlr Reference is not only a good place to start, but one of the only places to start short of signing up for a university course.

You can purchase The Definitive ANTLR Reference from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

Sorry! There are no comments related to the filter you selected.

LR versus LL (1)

OrangeTide (124937) | more than 6 years ago | (#23574247)

Why is it called ANTLR when it's an LL parser? Personally I prefer LR parsers, even though you can't parse Java with it (as far as I know).

Re:LR versus LL (4, Funny)

JonTurner (178845) | more than 6 years ago | (#23574295)

Gotta look at the prefix: "ant" meaning "not".
IOW, AntLR is a "not LR" parser.

Hey, it made sense when I wrote it.

Re:LR versus LL (0, Flamebait)

Anonymous Coward | more than 6 years ago | (#23575097)

Mod this down, it is not correct at all. Is this what passes for an intelligent comment these days?

Re:LR versus LL (0)

Anonymous Coward | more than 6 years ago | (#23575461)

AC, have you no sense of humor? I was having a laugh based on the parent LL/LR parser comment.

Re:LR versus LL (0)

Anonymous Coward | more than 6 years ago | (#23575597)

AC, have you no sense of humor?
There's more to humour than merely saying something false. (And no, I'm not the same AC.)

Re:LR versus LL (0)

Anonymous Coward | more than 6 years ago | (#23576307)

there's more to life than being a stick in the mud. and no I am not the same AC

Re:LR versus LL (0)

Anonymous Coward | more than 6 years ago | (#23588603)

Congratulations on completely missing the point. Far too many people on Slashdot post utter crap, without any hint of humour whatsoever, and then scream "OMG you can't take a joke" whenever anyone tries to correct them.

Re:LR versus LL (2, Informative)

compro01 (777531) | more than 6 years ago | (#23574387)

I believe the name means ANother Tool for Language Recognition. Nothing to do with LL vs. LR.

Re:LR versus LL (0)

OrangeTide (124937) | more than 6 years ago | (#23574847)

But ANTLR is an LL parser. it literally has to do with LL!

Re:LR versus LL (1)

Jansingal (1098809) | more than 6 years ago | (#23580061)

why the obsession over an acronym?

Re:LR versus LL (1)

compro01 (777531) | more than 6 years ago | (#23581403)

Because we're bored and have nothing better to do on a Wednesday night.

Re:LR versus LL (0)

Anonymous Coward | more than 6 years ago | (#23574537)

LR is for "language recognition", but I see what you're saying.
You should write an LR parser and call it ANTLL.

Re:LR versus LL (0)

Anonymous Coward | more than 6 years ago | (#23576293)

I've never bothered to look into antlr, and somehow always assumed it was LR. Better stick with yacc then.

Antlr Parses Java Just fine (1)

tempest69 (572798) | more than 6 years ago | (#23576607)

I had a compiler class a few years back, we wrote a java compiler that produced MIPS code..

sure, we skimped on the java implementation.. but it was able to handle simple functions like factorial, and sorting algorithm, Objects.. I didnt manage to get inheritance to work, but that's my goof.

Storm

Re:LR versus LL (2, Funny)

DrEasy (559739) | more than 6 years ago | (#23577451)

He should have called it Parrser, named after the author, plus it has a nice pirate-y ring to it.

Re:LR versus LL (1)

427_ci_505 (1009677) | more than 6 years ago | (#23579107)

1. Yes, ANTLR is an LL parser. There is a java parser implementation in ANTLR as well.

2. The set of languages parseable by an LR parser is a superset of the set parseable by an LL parser.

Ergo, yes, you can indeed parse Java with an LR parser.

pot, kettle (-1, Troll)

Speare (84249) | more than 6 years ago | (#23574277)

the subject matter is obtuse and few, until now, have ventured to write expository literature to explain the myriad concepts to the non-academician

Or in other words, most of them used big words to impress or confuse you.

Re:pot, kettle (1)

maxume (22995) | more than 6 years ago | (#23574377)

You did better than I did, I got confused at "Kauzlarich".

Re:pot, kettle (1)

Usquebaugh (230216) | more than 6 years ago | (#23577329)

Yes indeed,

A fine single malt that Kauzlarich

What are the most "standard" parser generators? (4, Interesting)

mrchaotica (681592) | more than 6 years ago | (#23574321)

I recently ran across this problem at my job: we maintain compilers for several(!) in-house languages, and I recently re-wrote the one for the most simple of them, changing it from a collection of three separate utilities (the most complicated of which was written in FORTRAN, which is generally horrible for manipulating text) into a Lex/Yacc/C (or rather, Flex/Bison/C) compiler.

I chose Lex and Yacc not because they were good, but because they're (in my opinion) very likely to be around 50 years from now. Are there any other compiler generators (such as possibly ANTLR) that might also meet this criteria, and would have been a better choice?

Re:What are the most "standard" parser generators? (-1, Troll)

Anonymous Coward | more than 6 years ago | (#23574437)

The word "criteria" is the plural of "criterion", please get it correct next time.

Best regards!

Re:What are the most "standard" parser generators? (1)

mrchaotica (681592) | more than 6 years ago | (#23574501)

Quite right; sorry!

Re:What are the most "standard" parser generators? (4, Funny)

cptnapalm (120276) | more than 6 years ago | (#23574587)

Nothing like a grammar parsing topic to bring out the language Nazis.

Re:What are the most "standard" parser generators? (1)

TaleSpinner (96034) | more than 6 years ago | (#23594805)

> Nothing like a grammar parsing topic to bring out the language Nazis.

Ve have VEYS uff makink you parse...

Re:What are the most "standard" parser generators? (1)

Jansingal (1098809) | more than 6 years ago | (#23580099)

so why not then say 'bests regards'?

Re:What are the most "standard" parser generators? (1)

morgan_greywolf (835522) | more than 6 years ago | (#23574659)

PLY, although this is an implementation of Flex/Yacc in Python.

Re:What are the most "standard" parser generators? (0)

Anonymous Coward | more than 6 years ago | (#23576385)

ANTLR is used in a surprising number of places, especially as a parser for SQL and SQL-like languages for Enterprise Java systems.

There are 2 things that endear it to me over lex/yacc:

1. ANTLR realizes that lexical analysis is itself a parsing operation, and thus permits using the same facility to build the lexer as the parser.

2. Unlike yacc, which produces a program for a finite-state machine, ANTLR produces a goal-directed program. The problem with debugging FSM's is that most of what you're stepping through is the FSM itself - it's hard to catch a parsing operation going unexpectedly off-course. In an ANTLR-generated parser, you're stepping through logic that essentially maps the syntax definition instead of stepping through the engine.

Re:What are the most "standard" parser generators? (1)

Cyberax (705495) | more than 6 years ago | (#23582329)

ANTLR is about 16 years old, so it's going to stay for a long time now.

Other source (1)

LarsWestergren (9033) | more than 6 years ago | (#23574353)

If you think compiler or parser design is interesting or may conceivably write a domain-specific language for your workplace, the Definitive Antlr Reference is not only a good place to start, but one of the only places to start short of signing up for a university course.

I am currently reading Programming Language Pragmatics. It's pretty good I think, but then, I have nothing to compare with. I'll probably pick up the Antlr book too.

Re:Other source (0)

Anonymous Coward | more than 6 years ago | (#23574943)

I just finished Programming Language Processors
in Java , a sequel to a few other books - this one being in Java, and it does an _excellent_ job in both conveying the material in an easy to understand manner, and also explaining the miriad of math symbols used in describing grammars. It's truly a rare find.

I'm currently using Antlr for a DSL at work, and am going over this book as we speak!

Re:Other source (1)

LarsWestergren (9033) | more than 6 years ago | (#23582285)

Thanks AC, I will check it out. Interesting review by Steve Yegge on Amazon though.

Most Well Known? (1)

coaxial (28297) | more than 6 years ago | (#23574429)

Antlr? Never heard of it. Now Lex/YACC, I have.

Re:Most Well Known? (2, Insightful)

OrangeTide (124937) | more than 6 years ago | (#23574589)

I prefer the LEMON parser generator [hwaci.com] over yacc/bison. You can use LEMON with lex/flex or just roll your own scanner by hand (which is usually pretty easy anyways).

Re:Most Well Known? (1)

TheRaven64 (641858) | more than 6 years ago | (#23576613)

Seconded. I am currently in the middle of writing Smalltalk JIT using LLVM and am using Lemon for the front end because. It is very simple to use and, being public domain, has no license issues to think about. I wrote my own scanner, but it's only about a dozen lines of code.

I've not benchmarked the code it produces though, so I can't comment on its output.

Re:Most Well Known? (1)

Haeleth (414428) | more than 6 years ago | (#23574665)

Hence "second most well-known". As in, less well-known than YACC and its clones, but better-known than any of the other million parser generators.

I've no idea whether that's true, of course.

Re:Most Well Known? (1)

larry bagina (561269) | more than 6 years ago | (#23574717)

I've never heard of it either. Bison, YACC, CUP, LEMON, a couple other C-based ones, but not ANTLR.

Re:Most Well Known? (1, Funny)

The End Of Days (1243248) | more than 6 years ago | (#23575655)

You mean to say the submitter didn't consult with you first before making that statement? The nerve of Slashdotters these days...

Re:Most Well Known? (1)

quarrel (194077) | more than 6 years ago | (#23574761)

A common problem, that a book like this should help correct (NB: I haven't read it).

Back when we were kids, the compiler courses taught us about limitations to LL(k) grammars, that it turns out, aren't true (ok, ok - the theory was in fact correct, but the practical implications they passed on were in fact incorrect).

Enter ANTLR - it changed the game - and you should get to know it and why it it did. Generic LL(k) grammars at your fingertips.

I thought I understood this stuff, because of so called "definitive" statements when I was a kid, but then someone pointed out that they'd made a break-through or two since I was an undergrad.

If you haven't already, learn about ANTLR - what you remember about parser generators HAS CHANGED (particularly if you're stuck in an LL(1) world).

--Q

Re:Most Well Known? (1)

Abcd1234 (188840) | more than 6 years ago | (#23575025)

Indeed... arbitrary lookahead in a parser is a godsend, and makes much more interesting grammars possible. Plus, the top-down parsing strategy makes it a *lot* easier to generate sensible error messages, since the parsing context is readily available.

multiple grammars in same program? (0)

Anonymous Coward | more than 6 years ago | (#23574611)

When I last used ANTLR there was an issue where I could not use the latest version because some vendor's jar file happened to be using an older version of ANTLR. Some other yacc-type programs generate all source code supplied with a class prefix of your choice and do not use a common library. This allows combining many parsers in the same program - even with different versions of the parser generator. Has ANTLR followed suit yet? The workaround using class loaders is a pain.

1 grammar - 1 target language (1)

El_Muerte_TDS (592157) | more than 6 years ago | (#23575067)

One thing I really don't like is that a ANTLR grammar is limited to a single target language. You can use the same grammar to produce both a Java and C# parser. You need to make a few tedious changes to the grammar file.
We currently have a grammar that needs to be preprocessed a bit before we feed it to ANTLR to produce the parser. It's only about 5 lines in the grammar that need to be changed.

Re:1 grammar - 1 target language (-1, Redundant)

Anonymous Coward | more than 6 years ago | (#23576103)

Errr- source code control anyone?

Re:1 grammar - 1 target language (1)

parrt (157907) | more than 6 years ago | (#23577407)

ANTLR can generate 5 targets at the moment. perhaps you are referring to the fact that actions embedded within the grammar are in a particular language?

Re:1 grammar - 1 target language (1)

El_Muerte_TDS (592157) | more than 6 years ago | (#23583049)

The problem is a simple as this:


grammar Cps;

options {
        k = 1;
        output = AST;
        superClass = CpsParserBase;
        language = Java; /* but I also need CSharp
              and there is no way to set the target
                      outside of the grammar */
}

[...] /* For Java, not ignored when CSharp is the target */
@header {
package Composestar.Core.COPPER2;
}
@lexer::header {
package Composestar.Core.COPPER2;
} /* For CSharp, ignored when Java is the target \o/ */
@lexer::namespace {Composestar.StarLight.CpsParser}
@parser::namespace {Composestar.StarLight.CpsParser}

Much easier than Lex, Yacc (0)

Anonymous Coward | more than 6 years ago | (#23575129)

I had the (miss?)opportunity to start doing compilers with Yacc, Lex. It's easy to make mistakes and you're bound to C.

ANTLR takes some weight by letting you pick a language in which it will generate code. It also has a defaul implementation for an AST and as of a few weeks back you can use BNF notation to make changes to ASTs too (this feature is not covered in the book).

Having the book close while writing new compilers was of great help. It's only useful for your first compilers, you tend to use from the start many of the features ANTLR gives you, and you learn them by heart.

The book can certainly be improved (I know I had trouble understanding in some corner cases why ANTLR worked the way it did), but compared to the information available on the net only, it's really worth having close.

Good stuff... (4, Interesting)

Rocky (56404) | more than 6 years ago | (#23575269)

..problem is, you can't really do anything non-trivial in ANTLR 3.0 without buying the book.

They've drastically reduced the freely available documentation on their web page, so you are essentially forced to buy it.

Re:Good stuff... (1)

monkdwally (1292822) | more than 6 years ago | (#23576021)

..problem is, you can't really do anything non-trivial in ANTLR 3.0 without buying the book. They've drastically reduced the freely available documentation on their web page, so you are essentially forced to buy it.
I don't know where you get that idea from, other than a few idiot postings on the ANTLR mailing list perhaps, or maybe you were one of those and are just trolling. Anyway, it is bollocks - you think anyone working for free on a free open source project can be bothered to go DELETE documentation? Nothing has ever been removed from the web site unless it was wrong, example grammars are all over the place and there is a wiki that tells you how to do anything the book does and is added to pretty regularly. The book is just more organized and easier to follow. It seems that the price for book form documentation is reasonable for completely free (BSD) software and you don't even HAVE to buy it. The time investment writing a large grammar is quite a big investment and unless you get paid less than 50 cents an hour, or can instantly write complex grammars in less than an hour, then forty some bucks is a bargain. Of course, there are plenty of people that think a parser generation tool should write their code for them, but parsing anything of any size (syntax wise) isn't a trivial exercise. If you can't get going with the example grammars and the wiki, you should probably consider letting someone else do the work.

Re:Good stuff... (2, Informative)

JeroenFM (1259708) | more than 6 years ago | (#23576711)

As a person who owns the book and has tried working without it, I have to agree with grandparent here. The book is a must-have if you want to do serious work with version 3 of ANTLR - the v3 documentation or Wiki might contain some of the information you need for a serious grammar, but it's not presented in a consistent or useful manner. Sure, you can write a grammar without the book but unless you're intimately familiar with ANTLR, much of the online documentation just isn't all that helpful. The book on the other hand does a perfect job of explaining things.

Re:Good stuff... (2, Informative)

ghettoimp (876408) | more than 6 years ago | (#23578983)

After using ANTLR for a class long ago and being so impressed with it, I just returned to ANTLR today. I was shocked at the lack of documentation on the web site. I eventually typed "antlr reference" into google and found the following PDF: http://www.antlr.org/share/1084743321127/ANTLR_Reference_Manual.pdf [antlr.org] It's outdated and had many no-longer-supported constructs, but paired with the changes from 2.x to 3.0 it was adequate for what I needed to do. I can see nothing comparable linked from the ANTLR homepage. It seems like an obvious attempt to get you to buy the book. Oh well.

Re:Good stuff... (5, Informative)

parrt (157907) | more than 6 years ago | (#23577513)

howdy. I never deleted anything from the documentation. v3 was completely new, I simply didn't provide as much documentation as some would like. I had a simple choice to make: (1) write some free documentation for which I would not be very motivated (after doing the 5 years of 7 day/week coding effort for v3) or (2) use cash to motivate myself to write decent documentation (side benefit is that I could use the book towards getting tenure at the University of San Francisco whereas documentation does not count as a publication). Obviously I chose (2), but I understand your frustration completely. It is only like 20 bucks at Amazon though ;)

Re:Good stuff... (-1, Troll)

Anonymous Coward | more than 6 years ago | (#23577979)

Fuck you man. Pimping your shit like this.

Re:Good stuff... (1)

parrt (157907) | more than 6 years ago | (#23578061)

Uh...this whole thread is about the book, bro. Just explaining the disparity between the documentation in the book. sorry if I offended you by repeating what the author of this review indicated: that the book was available Amazon. damn, there I go again!

Re:Good stuff... (1)

gstamp (99104) | more than 6 years ago | (#23579261)

That's not very nice.

Personally I thought the book was great.  I got the PDF version.  The free documentation while it could be better is still okay however I'd recommend just getting the book.

Re:Good stuff... (1)

LizardKing (5245) | more than 6 years ago | (#23583811)

The fact that you directly benefit from the book is a plus if it means there's more of an incentive for you to work on ANTLR, so I've just been to Amazon to order a copy.

Re:Good stuff... (1)

paulsnx2 (453081) | more than 6 years ago | (#23584379)

Anyone that would complain about what Dr. Parr has contributed to the "compiler compiler" crowd simply hasn't tried to do work like this themselves. It is time consuming and draining to work on a project of this magnitude in addition to a job, and on top of that write decent documentation.

I couldn't quite get Antlr 2.0 to work for my Domain Specific Language application (a Decision Table based Rules Engine), and that mostly because digging through all the online documentation answered my questions at simply too slow a pace. So I did yet another Flex/CUP implementation.

But now with a book in hand, I am ready to give it another shot. I don't just appreciate Dr. Parr's efforts in putting together a book, I prefer them organized in this form.

$30 (including shipping) just isn't that much for my company to pay for a handy and effective reference.

Re:Good stuff... (1)

Rocky (56404) | more than 6 years ago | (#23587259)

I have no problem with why you did it - money and pubs are great motivators for doing something as jejune as documentation.

I just thought that you should know that it made development "interesting" (read:very trial-and-error) until someone from the lab bought the book. It might also make it more difficult for beginners to get into using the tool, although I suppose you could make it a required text if you were teaching a class.

P.S.: Love the interpreter - that by itself saves a bunch of time!

A new framework comparable to ANTLR: Gazelle (4, Interesting)

CoughDropAddict (40792) | more than 6 years ago | (#23575515)

I would encourage anyone who is interested in parsing or ANTLR to follow my project Gazelle [reverberate.org] . It is intended to be a next-gen parsing framework that builds on the ideas set forth in ANTLR but packages them in a significantly different way, which offers a lot of benefits (which I list in detail on the website).

The primary thing I am trying to deliver is reusability of parsers. The open-source community should be able to cooperate to write canonical parsers for all popular languages, but this goal is hampered by the fact that almost all parsing tools (ANTLR included) encourages you to write non-reusable grammars by virtue of the fact that you embed actions into the grammar.

Gazelle also takes a interpreter+JIT approach instead of emitting code in an imperative language. So for example, if you want a really fast HTTP parser from Ruby (which is precisely the raison d'etre for Mongrel), you can use the HTTP Gazelle parser from Ruby, but since the parsing is actually performed by the interpreter+JIT (written in C), you get extremely fast parsing without writing a line of C.

Gazelle is still very immature and not ready for people to try out, but I would encourage anyone who's interested to follow the Gazelle category on my blog [reverberate.org] .

You can also check out:
- the current draft of the manual [reverberate.org] , which will give you a better idea of the specifics of where I'm going with this.
- a graphical dump of the grammar for JSON [reverberate.org] , which the current development code is capable of generating.

Re:A new framework comparable to ANTLR: Gazelle (0)

Anonymous Coward | more than 6 years ago | (#23588773)

Very interesting project.

I did not see any examples of how, say, 2 different languages can bind to the canonical grammar engine gazelle produces. Is there going to be an API for each supported language to bind actions to code in the host language?

Re:A new framework comparable to ANTLR: Gazelle (1)

CoughDropAddict (40792) | more than 6 years ago | (#23590723)

Very interesting project.
Thanks!

I did not see any examples of how, say, 2 different languages can bind to the canonical grammar engine gazelle produces. Is there going to be an API for each supported language to bind actions to code in the host language?
Yes. And rather than using something like SWIG, I want a lot of thought to go into each language's binding, so the interface is really idiomatic for each language.

For a look at my very preliminary version of the C api, check out this program that is built on Gazelle 0.1. Check out the "register_callback" calls in this source file:

recs-collate.c [github.com]

As this gets more sophisticated, I want to use CSS selectors and/or XPath as a model for specifying the predicates for when callbacks are called. For example, imagine being able to specify callbacks like so:

register_callback("str_frag:unicode_char", callback); // call me when "unicode_char" is seen inside a str_frag rule
register_callback("> object", callback); // call me when object is seen, but only at the top level

ANTLR (0)

kick_in_the_eye (539123) | more than 6 years ago | (#23575695)

ANTLR - Am Now The Laziest Reader. I didn't read this article. Guess that makes me the average slashdotter.

Antlr? (0)

Anonymous Coward | more than 6 years ago | (#23575775)

I hate my uncle
but I love my antlrs
'cause antlrs are a moose's best friend!

C# parser in Java using ANTLR (0)

Anonymous Coward | more than 6 years ago | (#23576229)

The C# parser here: www.temporal-wave.com [temporal-wave.com] has an online C# parser running in Java from ANTLR. There are also Java parsers in C# from ANTLR output and Python parsers in C and C parsers in ActionScript and err now I am totally confused....

I really like ANTLR. (4, Interesting)

BillWhite (14288) | more than 6 years ago | (#23576359)

I've used both ANTLR from PCCTS 1.3 and Bison pretty extensively. We have multiple bison and ANTLR grammars in our product. I like ANTLR generally better than bison. The extended BNF is really useful. And when you get used to writing top-down grammars, they are not so odd. In fact, with eBNF notation, alot of the peculiarity is taken away.

I also like Sorcerer. In PCCTS 1.3, Sorcerer is a kind of tree traversal grammar tool. You create ASTs, with ANTLR, and Sorcerer creates a program which will traverse them, and call action routines where you specify. It's really pretty neat.

I'm thinking about something else though. I'm thinking we should really think about programming with grammars more than we do. Say, for example, you have a user interface of some kind. It gets certain events, its state changes, and it reacts to the environment. A good fraction of the set of state changes can be captured with some kind of finite state machine. But a context free grammar is equivalent to a finite state machine with a pushdown list to hold context. So, it seems very likely to me that a good way to build user interfaces is to somehow compose grammars. The tokens are events, and the action routines are the non-FSM state changes.

So, why is this interesting in this discussion? Well, ANTLR from PCCTS 1.3 is a recursive descent parser, and YACC/Bison are bottom up parsers. This means that the pushdown list for ANTLR is the execution stack, and the pushdown list for YACC/Bison is in data space. It's hard to see how one would maintain multiple ANTLR-style automata concurrently, but that's what you want to do for this style of programming.

Generally YACC/Bison pushdown lists and other parsing data are kept in global variables, but there is a way to make Bison generate C++ useable grammars where the parsing data are saved in a heap allocated object. This means they have a fixed size, which may be a problem. But it would not take a lot of work to change the parsing algorithm for Bison to make the pushdown list a linked list, and that might make things easier.

So, in short, I think it's pretty interesting to look at parsing, even if you're not writing compilers.

Re:I really like ANTLR. (1)

anomalous cohort (704239) | more than 6 years ago | (#23580169)

Ditto. I am also a big fan of embedding mini-languages in bigger systems and of ANTLR for all the reasons you state plus a few more [transitionchoices.com] .

Three cheers to Terence Parr for this remarkable technology.

I have not taken the step to upgrade to version 3. I hear that the grammar specifications are significantly different. I have a question. Is it worth all the rewriting of grammar and migration of scripts to upgrade? Has anyone here used ANTLRWorks? I am really pleased with version 2.7.5 so it is hard to get motivated.

Interpretting parser (1)

Frans Faase (648933) | more than 6 years ago | (#23576481)

Some years ago, I wrote an interpretting parser, which simply loads a grammer (in extended BNF) and next parses a string according to it. The lexical analyses needs to be hand coded, but examples for the most common literals are included. The interesting bit is that the parser controls the lexer, which simply gives you context sensitive lexing. The whole thing is rather small. The nice thing is that it doesn't generate code. I once tried to make it generate code, but the produced code was actually slower than the interpretting version. (Probably has to do with the fact that the interpretter fits in the primary processor cache.) The parser uses back-tracking, but uses some smart caching, making it very fast. The whole interpretting parser consists of a single C file of about 100 Kbyte and can be found here [iwriteiam.nl] .

Re:Interpretting parser (1)

parrt (157907) | more than 6 years ago | (#23577039)

ANTLR v3 has an interpreter also for ANTLR grammars; does everything except execute actions (of course) and doesn't yet deal with syntactic predicates.

Drools and Antlr interview with Ter (0)

Anonymous Coward | more than 6 years ago | (#23577335)

Antlr 3.0 has been invaluable for Drools, can't wait for 3.1 which has better performance.

We did an interview with Ter and put it on our blog
http://blog.athico.com/2007/06/interview-with-antlr-30-author-terrence.html [athico.com]

Mark
http://blog.athico.com [athico.com] blog
http://labs.jboss.org/drools [jboss.org] homepage

Stupid grammar quesition (1)

beernutz (16190) | more than 6 years ago | (#23577453)

Is the saying "Painstaking" hyphenated as Pain-staking or Pains-taking?

I always thought of it as Pains-taking.

IE:

I have taken great pains to get this right.

Just curious if i have been using it correctly.

Re:Stupid grammar quesition (0)

Anonymous Coward | more than 6 years ago | (#23578529)

Yes. The word is a contraction of "pains-taking".

Much as this reply is time-wasting and space-filling.

Pains-taking (1)

Jane Q. Public (1010737) | more than 6 years ago | (#23580765)

As in, "I am taking pains to see that such-and-such is done to exacting standards..."

Cool! But who needs parsers? (1)

Jay L (74152) | more than 6 years ago | (#23577633)

About seven years ago, I needed to write a DSL. I used ANTLR to do it, and it was a pleasure - even though I didn't know Java! I wrote out the grammar, ANTLR wrote all the code for me, everyone was happy. And that was long before there was anything as amazing as ANTLRWorks.

But lately I've been wondering: Do we really need parser generators anymore? In Ruby, if you're writing a DSL, you usually implement it in terms of the Ruby language itself. I imagine that's true for other dynamic languages too. Look at Rails, or Adhearsion, or RSpec: They're DSLs, but they're valid Ruby.

And if you're not writing your own DSL, but parsing a "well-known" format, parsers have a downside there too: They're picky. If I need to scrape a bit of data out of an XHTML page, I'm better off *not* using an XML parser; it'll choke. It's usually easier to use a regex, waste some theoretical CPU, and throw away the bits I don't need.

So, parser lovers: Why do we still need cool tools like ANTLR? What do they provide that crude regular expressions don't?

Re:Cool! But who needs parsers? (0)

Anonymous Coward | more than 6 years ago | (#23577981)

What do they provide that crude regular expressions don't?
Do this with a regex:

e : (e*)

Re:Cool! But who needs parsers? (1)

Jay L (74152) | more than 6 years ago | (#23585129)

What do they provide that crude regular expressions don't?
Do this with a regex:

e : (e*)


OK, I did, but it's not done running yet...

Re:Cool! But who needs parsers? (0)

Anonymous Coward | more than 6 years ago | (#23578495)

Lisp is particularly suited to this sort of thing.

Here's an example http://bc.tech.coop/blog/050711.html [tech.coop]

Seriously? (1)

xenocide2 (231786) | more than 6 years ago | (#23578583)

Crude regular expressions only work for crude regular languages. You mention parsing XHTML, but what evidence do we have that regex don't choke on html just as often as XML parsers? HTML is complicated and often broken enough in distribution that I'm not sure you can define a universal accepted "well formed-ness" checker. And it's not that regex waste CPU -- you can implement them in time linear to the input, it's that they break quite easily if there's any variety in your input. I recall someone complaining that the Wordpress XML parser was regex written to handle their own output, and when someone wanted to migrate from one to the other, their valid XML didn't work because it assumed a certain order of elements.

For most XML doctypes, I think it's useful to have a generated parser available. If you want your input to be something akin to XML and you want to accept as many files as possible, grammars and parsers are your only valid option.

Re:Cool! But who needs parsers? (2, Insightful)

blitz487 (606553) | more than 6 years ago | (#23579455)

Regular expressions cannot handle recursive grammars.

Obscure, not obtuse (1)

5pp000 (873881) | more than 6 years ago | (#23577719)

the subject matter is obtuse

No, it's not obtuse, it's obscure.

Obtuse is the antonym of acute. In geometry, an obtuse angle is one greater than 90 degrees. Metaphorically, an obtuse person is one who is not sharp.

Obscure, on the other hand, from the Latin word for "dark", means difficult to perceive or understand.

Re:Obscure, not obtuse (1)

ratbag (65209) | more than 6 years ago | (#23582175)

"Abstruse" (from the Latin abstrudere, to conceal, would also be appropriate in this case, since it means "difficult to understand".

If I'd known this was going to be slashdotted... (0)

Anonymous Coward | more than 6 years ago | (#23579069)

..I would have trimmed my ANTLRs.

ANTLR vs Gold Parser (4, Interesting)

the-matt-mobile (621817) | more than 6 years ago | (#23579953)

On a project I was on, I needed to parse 50+ COBOL copybooks in .NET so that we could use those data definitions to whittle down a 600MB flat file full of nightly data for a data warehouse.

I tried ANTLR, and I wound up abandoning it. I wanted ANTLR to work - I really thought it seemed to be the best and most mature solution, but the documentation is ABYSMAL. And if you want to use .NET with it instead of Java, there's a reserved circle in hell for that.

I'm sure if I'd had proper documentation, I could have managed it. I have a CS degree and took a 400 level automata class as part of my curriculum, so I'm no feather weight on this topic. Heck, I've even used this [cpan.org] in a past life. But, I still was at a loss as to how to even begin with ANTLR and C#.

I found Gold Parser [devincook.com] , and I was done with 151 commented lines in my grammar file, 148 lines of generated C# code in a constants file, and less than 500 lines of business logic to actually deal with the parsed copybook to do what we needed to do with it. The whole development effort was mere days, and let me quickly get out of the weeds and back to solving the real business problem.

Recently, I saw this ANTLR reference discarded on a shelf in Half Price Books, and just had to pick it up. I thought, maybe this would have gotten me there. Alas, the book is not even close to a reasonable reference. Moving from tokenizers and Regular Expressions to full-fledged CFGs is not a layman's topic, but there's not much help to be gleaned from the arcane drivel that comes from Mr. Parr's book on his already overly complicated application. If you're interested in theory and wallowing in the mire of acedemia, then this is the book for you. But, if you have a working understanding of the topic and just want to get the blasted tool to work, the best advice I have is to stay far, far away. At least, that is, until someone other than Parr himself sets forth a reasonable guide to this ANTLR'ed beast.

I found this to be much the same with Parr's StringTemplate [stringtemplate.org] project. After trying to figure it out from the limited documentation, I found it less painful to just dredge up NVelocity from its hidden location at Castle Project [castleproject.org] , fish for the necessary dlls, throw away the rind (who uses Castle anyway???) and happily went about my merry way.

Re:ANTLR vs Gold Parser (2, Interesting)

anomalous cohort (704239) | more than 6 years ago | (#23580275)

I am posting this simply so that others can see a different view and judge for themselves.

I have used ANTLR for years (not version 3) and have had no trouble getting it to do what I want. I have not tried to get it to interpret COBOL code, however. I have even used ANTLR in .NET and found it to be easy, breezy.

Keep in mind that this is no drag-and-drop technology for light weights. You are really going to have to know your compiler and formal language theory and be willing to study some sample grammars [antlr2.org] . You should also be comfortable with BNF and prior experience with YACC is a plus.

Re:ANTLR vs Gold Parser (1)

the-matt-mobile (621817) | more than 6 years ago | (#23581179)

Every single link on that grammars page you provided is broken, even though a lot of the stuff says 2008. That's exactly the kind of documentation frustrations I was talking about with ANTLR. ANTLR may well be the cat's meow, but much like a cat I personally found it fickle on its best days.

In contrast, here are some of Gold Parser's grammars [devincook.com] including the one for COBOL [devincook.com] . And, you can also find example grammars for things you already fundamentally understand the full scope of like phone numbers [devincook.com] . The grammars are written in a BNF dialect similar to what ANTLR uses, but incompatible.

Gold might not be as mature a product, have the best architecture, or have much active development happening, but it's a good enough LALR parser with plain English documentation, which is what ANTLR could really use and what this book attempts to be. YMMV of course.

Re:ANTLR vs Gold Parser (1)

the-matt-mobile (621817) | more than 6 years ago | (#23581225)

Quick addendum - I would be remiss if I forgot to mention that my personal favorite language [codehaus.org] is implemented using ANTLR, so it isn't all bad.

Re:ANTLR vs Gold Parser (1)

drew (2081) | more than 6 years ago | (#23581483)

Try changing antlr2.org to antlr.org in the URL. It seems that in the changeover from version 2 to version 3, a lot changed around on the website, and there are now a lot of broken links on both versions of the site. It's unfortunate, because it does reflect poorly on what otherwise seems to be a pretty good project. I've not actually used ANTLR yet myself, although I may be using it in the near future. (But thanks to the pointer to GOLD, I'll look into that as well.) The site has been very valuable to me in the past in that there used to be a PDF by the author about how to build an ANTLR style parser by hand. As the language I was using at the time was not supported y ANTLR or any other tool that I could find, that PDF was key to me being able to complete the project that I was working on. Unfortunately, I didn't think to download the PDF at the time, and it too seems to have been a casualty of the chaos that has consumed the ANTLR website of late. If that chapter is a part of this book, I think I'd be willing to drop the $25 just to have a copy of that on hand.

Re:ANTLR vs Gold Parser (0)

Anonymous Coward | more than 6 years ago | (#23582109)

I guess it's how you think about things.

I've used ANTLR for a proof-of-concept research and it worked fine and I've got StringTemplate code that is used everyday for a business critical reporting app.

From my perspective, StringTemplate (I use the python implementation) is the only templating language that felt like it was designed. It's just usable.

Re:ANTLR vs Gold Parser (1)

dread (3500) | more than 6 years ago | (#23582227)

Interesting. I wrote two grammars without much hassle after buying the book last spring and they are now part of our commercial product. Sure, the book isn't exactly Dr Seuss but I much prefer a book by someone who is actually enthusiastic about his/her own subject and that goes into core concepts. And considering that I had very little previous knowledge about formal languages and/or compiler theory I find your comments about the "overly complicated application" and "arcane drivel" to be off the mark by quite a bit.
Oh, and don't get going on the "but we wanted to do it for .NET" because we are generating for both Java and C# depending on environment. Seriously, if someone like me who hasn't studied for a second beyond high school can handle it and you can't then you have to wonder.

ANTLR vs JParsec (0)

Anonymous Coward | more than 6 years ago | (#23584191)

I found the STG library extremely useful. I recently replaced an ANTLR based compiler with a JParsec based one, and STG handled the code generation phase with aplomb. It's a little inconsistent at times, but still saved a ton of work.

JParsec is pretty damn cool too. Write your grammars in as re-usable, refactorable code, using your favourite IDE, rather than using an external file format. It's based on Parsec, a haskell Parser Combinator, so supports LL(k), LL(*), as well as combining lexical and syntax analysis (so tokenizing can be done at parse time), making context sensitive grammars possible.

We used JParsec because some of our tokens contained special characters e.g. A+, but that's another story.

Re:ANTLR vs Gold Parser (1)

datadigger (1014733) | more than 6 years ago | (#23591865)

On a project I was on, I needed to parse 50+ COBOL copybooks in .NET so that we could use those data definitions to whittle down a 600MB flat file full of nightly data for a data warehouse. I tried ANTLR, and I wound up abandoning it.
COBOL is pretty straightforward. I did something similar with awk. One script of about 150 lines handled most of the essentials. Output files were another awk script with field offsets and lengths for usage in the converted extract, a file with extraction commands for a mainframe utility and a bare bones SQL DEFINE TABLE script.

second-most well-known? Really? (1)

jonadab (583620) | more than 6 years ago | (#23583917)

> the second most well-known compiler compiler, Terence Parr's Antlr

I was pretty sure the two best-known compiler compilers were yacc and bison, though I couldn't have told you which one is the most well-known and which one the second-most well-known these days. (I know which is older and which is newer... but that isn't necessarily the same thing.) I've never heard of Terence Parr's Antlr before. (I _have_ heard of PGE, but only because I read Perl-related news occasionally.)

Re:second-most well-known? Really? (0)

Anonymous Coward | more than 6 years ago | (#23588965)

I was pretty sure the two best-known compiler compilers were yacc and bison
Maybe the submitter considers bison to be an implementation of yacc? Because, y'know, it is?

Buy this Book (0)

Anonymous Coward | more than 6 years ago | (#23585719)

I bought and read significant chunks of this book. It is not only an excellent reference for understanding and deploying Antlr, but a fun read for those interested in learning more about the basics of parsing, grammars, and compiler compilers. I will admit to some bias as I have known Terence since his PhD days at Purdue, when he began working on the early versions of Antlr. I've watched the software progress and mature over the years into the highly refined tool Antlr has become, and I am sure it will continue to get better.

Re: (1)

clint999 (1277046) | more than 6 years ago | (#23586091)

Uh...this whole thread is about the book, bro. Just explaining the disparity between the documentation in the book. sorry if I offended you by repeating what the author of this review indicated: that the book was available Amazon. damn, there I go again!

Hmmm... (1)

erc (38443) | more than 6 years ago | (#23587141)

How is ANTLR better than YACC? Since the reviewer isn't familiar with the grand-daddy of compiler-compiler tools, how can one take the review seriously? As for the statement "there's nothing else like it out there", that's just plain fiction - there are a number of compiler-compiler books out there, especially dealing with YACC.

Re:Hmmm... (0)

Anonymous Coward | more than 6 years ago | (#23588331)

Since the reviewer isn't familiar with the grand-daddy of compiler-compiler tools, how can one take the review seriously?
Um... because it's not about yacc?

C++ Alternatives (0)

Anonymous Coward | more than 6 years ago | (#23587571)

A nice review! I will have to check out this book-- I like ANTRL, and all things parsing in general.

Folks in C++ land who are considering lex and yacc may wish to explore Boost.Spirit. The latest version is especially interesting.
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?