Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Regular Expression Recipes

timothy posted more than 9 years ago | from the prune-talkin' dept.

Programming 258

r3lody writes "If you spend time working writing applications that have to do pattern matches and/or replacements, you know about some of the intricacies of regular expressions. For many people they can be an arcane hodgepodge of odd characters that somehow manage to do wonderful things, but they don't have enough time (or interest) to really understand how to code them. Nathan A. Good has written Regular Expression Recipes: A Problem-Solution Approach for those people. In its relatively slim 289 pages, he offers 100 regular expressions in a cookbook format, tailored to solve problems in one of six broad categories (Words and Text, URLs and Paths, CSV and Tab-Delimited Files, Formatting and Validating, HTML and XML, and Coding and Using Commands)." Read on for the rest of Lodato's review.

Regular expressions are not restricted to just the Perl or shell environments, so Nathan offers variations for Python, PHP, and VIM as well. In most cases the translation is relatively straight-forward, but in a few cases a different environment may have (or lack) additional facilities, prompting a different expression to do the same task.

Before you even read chapter 1, Nathan provides a quick summary course on regular expressions, with detail given to each of the five environments you might utilize. He has written the syntax overview in a highly-readable format, making it easy to understand the gobbledy-gook of the most bizarre concoctions you might encounter.

The first chapter (Words and Text) starts simply enough. He gives examples of how to find single words, multiple words, and repeated words, along with examples of how to replace various detected strings with others. In each case he gives an example of its use for each platform, followed by a bit-by-bit breakdown of how it works. Not every environment is given on every example, and in many cases the "How It Works" section refers to the first one, as most REs are identical between the platforms.

The next chapter (URLs and Paths) offers various methods of doing commonly needed parsing. Pulling out file names, query strings, and directories, as well as reconstructing them in useful fashions is covered in the 15 offerings given here. Validating, converting, and extracting fields of CSV and tab-delimited files are handled in chapter 3, while chapter 4 is concerned with validating field formats, as well as re-formatting text for the fields. Chapter 5 handles similar tasks for HTML and XML documents. The final chapter covers expressions that facilitate the management of program code, log files, and the output of selected commands.

First, I must admit that there are a number of useful solutions provided, especially for someone who is concerned with application and web development. However, I did feel a little cheated by the fact that several chapters covered essentially the same task, with only minor variations. It almost seemed as though the author was trying to pad out the solution count to the magic number 100. A simple example: three solutions in chapter one cover (a) replacing smart quotes with straight quotes, (b) replacing copyright symbols with the (c) tri-graph, and (c) replacing trademark symbols with the (tm) sequence. In each case, the expression was simply "s/\xhh/ rep /g;". Did we really need three separate chapters for that? I don't think so.

Another quibble revolves around some of the coding of the expressions. Nathan has made liberal use of the non-capturing groups (that is, (: expr )) to insure only the items that needed replacement were captured. While a worthy idea, in some cases the expression may have been simplified for understanding. Another issue is a slight error in searching for letters. In a number of expressions, Nathan uses [A-z] to capture all letters. Unfortunately, the special characters [, \, ], ^, _, and ` occur between upper-case Z and lower-case a, making it match too much. Either [[:alpha:]] or [A-Za-z] should have been used.

Despite these quibbles, Regular Expression Recipes does provide a useful compendium of solutions for common problems developers face. Presenting the information in a cookbook fashion, along with ensuring that those using something other than Perl don't have to sweat translating the expressions to their target language, makes this a handy book to have. I wouldn't hesitate to recommend it.


You can purchase Regular Expression Recipes: A Problem-Solution Approach from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

Sorry! There are no comments related to the filter you selected.

fp. (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#12014935)

ya

Curious (3, Funny)

LiquidCoooled (634315) | more than 9 years ago | (#12014937)

I was performing a strange custom regular expression on the book review, and discovered that it outputted the following:

"Regex coders are in league with the devil"

Who woulda thunk it!

Re:Curious (1)

LordoftheWoods (831099) | more than 9 years ago | (#12015497)

interesting, so what secret regular expression construct matches what is nowhere in the original string?

Re:Curious (1)

ThomasFlip (669988) | more than 9 years ago | (#12015570)

/R[^e]*e[^g]*g[^e]*e[^x]x[^ ] [^c]*c[^o]*/ etc....
Would probably do it...

cant get used to them (1, Informative)

alienfluid (677872) | more than 9 years ago | (#12014942)

regular expressions are nice and all but i still cant get used to them .. a good manual should be kept handy at all times. Vist Lafayette Linux Users Group at http://lug.lafayette.edu [lafayette.edu] . Suggestions are welcome.

Re:cant get used to them (0)

Anonymous Coward | more than 9 years ago | (#12015197)

Learn the basics and then practice, practice and practice again.

Regexps sure look scary to start with, but if you practice enough they'll grow on you.

Then once you've got the basics covered, re-read your prefered manual/tutorial, pick up a couple of new techniques and pratice them.

Unless you're using them for really performance intensive stuff, there's nothing wrong with using a sub-optimal regexp, provided you understand what's happening. You can always improve your technique later.

I've used regexps pretty much daily for over 10 years, and I still pick up a better way of doing something once in a while.

Re:cant get used to them (2, Informative)

Anonymous Coward | more than 9 years ago | (#12015339)

regular expressions are nice and all but i still cant get used to them .. a good manual should be kept handy at all times. [ ... ]
Suggestions are welcome.

I have a suggestion. Write a few regular expressions to get your brain refreshed on them, then go read this excellent article [plover.com] on how regular expressions work. At the very least, it will clear some confusing things up. Most likely you'll find that having a better understanding of the underlying concepts will make it easier for you to work with regular expressions day to day.

Also, it helps if you are familiar with finite state machines. I learned about them in a couple classes while getting my CS degree, but they're not that hard and most people should be able to grasp them without any kind of formal CS training.

Re:cant get used to them (2, Informative)

Waffle Iron (339739) | more than 9 years ago | (#12015349)

regular expressions are nice and all but i still cant get used to them

They may be kind of hard to get used to, but not has hard as writing, debugging and maintaining a dozen or more lines of custom string parsing code for each case where you would use one.

Re:cant get used to them (4, Informative)

halber_mensch (851834) | more than 9 years ago | (#12015350)

A good starting point is to understand finite automata and regular languages first. See http://en.wikipedia.org/wiki/Automata_theory/ [wikipedia.org] for a good first reference on automata. If you can grok automata, regular expressions will click with you.

I enjoy... (-1, Offtopic)

TheRealMindChild (743925) | more than 9 years ago | (#12014945)

Stabbing old women with sharpened sporks from KFC

At least its not a dupe! Then again, its not CmdrTaco!

Points (4, Informative)

2.7182 (819680) | more than 9 years ago | (#12014946)

I really liked this book, but

1. the binding broke
2. the index has a lot of typos.

Re:Points (2, Funny)

LiquidCoooled (634315) | more than 9 years ago | (#12014959)

2. the index has a lot of typos.

No problem, the website issued a global regex and a pot of tip-ex for all customers.

Include "$35 cover price" or $24 after discount (0)

Anonymous Coward | more than 9 years ago | (#12015050)

Why can't a book review for an available include the COVER PRICE ? /. editors should reject these reviews if they omit the cover price

Re:Include "$35 cover price" or $24 after discount (0)

Anonymous Coward | more than 9 years ago | (#12015390)

Oh noes, I'm too stupid to look it up on amazon or click the "make money for /. at B&N" link!

Bran... (3, Funny)

Anonymous Coward | more than 9 years ago | (#12014950)

...is the best regular recipe.

Another one? (2, Insightful)

cmstremi (206046) | more than 9 years ago | (#12014958)

Isn't there already enough coverage for Regex's? With all the existing books and the nearly endless availability of free information and sites (including many using the 'recipie' format) online, who will want this book.

Re:Another one? (1)

scrotch (605605) | more than 9 years ago | (#12015503)

I don't know if this book would satisfy it, but personally, I'm tired of finding regex references that don't provide (or don't claim to provide) complete, working expressions. It seems like a pretty common occurrence to want to check that an entered email address could actually be an email address, but every regex tutorial/reference I have wimps out. They all say that their example is 'just for learning' or 'needs to be checked' or some such.

A cookbook approach to Regexs seems great to me. Look up the one you want if you're in a hurry, stop and study it if you want to really understand it.

If you know of a similar online reference, I'd love to know. It seems like there should be one out there.

Re:Another one? (1)

northcat (827059) | more than 9 years ago | (#12015508)

If you don't want this, don't buy it.

Re:Another one? (3, Informative)

carnivore302 (708545) | more than 9 years ago | (#12015675)

I don't think there is a need for another book on regexps, since there is already the excellent Mastering Regular Expressions [amazon.com] by Jeffrey Friedl. What else then the best can you expect from an O'Reilly book?

Regular expressions in a cookbook? (5, Informative)

DeadSea (69598) | more than 9 years ago | (#12014962)


Sounds like good eating. ;-)

Regular expressions are great, but once you know them and you think you can conquer the world, I find they occasionally let you down. The text editor I was using had a rudementary regular expression search that did not support non-greedy matching. I found that writing a regular expression that finds C style /* comments */ to be quite tricky with only greeding matching [ostermiller.org] . I wrote it up as an article where I build the expression piece by piece showing common things you might try that won't work.

If you want more of a challenge, try writing a regular expression that find any <script></script> tags along with anything in between using only greedy matching. You will find that the length of your regular expression goes up exponentially with the length of your ending condition.

--
Calculator for Converting Currency [ostermiller.org]

Re:Regular expressions in a cookbook? (4, Interesting)

interiot (50685) | more than 9 years ago | (#12015060)

Yup, regular expressions are not capable of a full-range of computing... they're pretty close [wikipedia.org] (they're the lowest of four in the Chomsky hierarchy), but still have a few limitations that can't be resolved without wrapping some extra code around them.

It still boggles my mind that people knew this in 1956 though.

Re:Regular expressions in a cookbook? (5, Informative)

merlyn (9918) | more than 9 years ago | (#12015194)

Yup, regular expressions are not capable of a full-range of computing
That's the "classic" regular expressions, not the modern regular expressions accepted by PCRE, and Perl itself. In fact, Perl regular expressions are full Turing machines, with PCRE being a few steps behind that. So PCRE isn't really PCRE... it's P-likeCRE. {grin}

Re:Regular expressions in a cookbook? (3, Insightful)

interiot (50685) | more than 9 years ago | (#12015291)

You mean all the sections of the perl regexp manual [cpan.org] that say "WARNING: This extended regular expression feature is considered highly experimental, and may be changed or deleted without notice" and then go on to say things that make my head truly ache?

I personally treat this like I do Perl5 threads... as something to be afraid of, and hopeful that things will be much improved in Perl 6.

Re:Regular expressions in a cookbook? (0)

Anonymous Coward | more than 9 years ago | (#12015490)

Don't quote communists.

Re:Regular expressions in a cookbook? (2, Interesting)

pcraven (191172) | more than 9 years ago | (#12015198)

This [regular-expressions.info] is a cool article on catastrophic backtracking. I remember the first time that got me. It would occasionally cause severe issues on a production server we had. I swung and missed with my reg ex on that one.

Re:Regular expressions in a cookbook? (2, Interesting)

prockcore (543967) | more than 9 years ago | (#12015590)

I've been doing regex for a long time (over 10 years), and the best rule I can give newbies to follow is "match less, not more"

Write your regex's so that they generalize as little as possible.

For example, matching an xml tag use /]+>/ instead of //

If you're using ".*?" in a regex, you might want to look at rewriting it.. it's almost never needed and almost always causes problems.

Re:Regular expressions in a cookbook? (2, Interesting)

prockcore (543967) | more than 9 years ago | (#12015644)

(damn, I should really preview sometimes)

The examples I gave are: /<[^>]+>/ instead of /<.*?>/

Re:Regular expressions in a cookbook? (1)

wirelessbuzzers (552513) | more than 9 years ago | (#12015298)

If you want more of a challenge, try writing a regular expression that find any <script></script> tags along with anything in between using only greedy matching. You will find that the length of your regular expression goes up exponentially with the length of your ending condition.

Actually, they grow quadratically:
s{<script[^>]*>
(
|[^<]
|<[^/]
|</[^s]
|</s[^c]
|</sc[^r]
|</scr[^i]
|</scri[^p]
|</scrip[^t]
|</script[^>]
)*
</script>}{}gix;

Re:Regular expressions in a cookbook? (2, Informative)

DeadSea (69598) | more than 9 years ago | (#12015496)


Your expression fails for this case:

<script><scri</script>

It will match <scri< with your |</scri[^p] rule and then go on to match beyond the end of your regular expression.

But I acknowledge that it may be quadratic rather than exponenetial even with a correct regular expression.

--
Exchange Rate Calculator [ostermiller.org]

Email RegEx (1)

tquinlan (868483) | more than 9 years ago | (#12014964)

I'm still looking for a good email regex, one that checks all forms of email addresses, including all the TLDs, and all the other various complicated forms email addresses can take.

Re:Email RegEx (0)

mqRakkis (521550) | more than 9 years ago | (#12015033)

Maybe not the perfect one, but pretty good anyway (the code around is PHP):

function isValidEmailString($email)
{
return eregi("^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[ a-z0-9-]+)*(\.[a-z]{2,3})$", $email);
}

Re:Email RegEx (0)

Anonymous Coward | more than 9 years ago | (#12015210)

unless im reading that wrong, it only works with .?? and .??? tlds (like .com, .cx, etc) but not .info or anything larger than 3 characters

Re:Email RegEx (1)

tehshen (794722) | more than 9 years ago | (#12015099)

As you specified all forms of e-mail addresses...

(I would post one here, but the lameness filter hates it, so I'll just link to it [regexlib.com] ).

Covers RFC 8288, as well as IP addresses.

Re:Email RegEx (2, Interesting)

Sir_Real (179104) | more than 9 years ago | (#12015206)

I'm still looking for a good email regex

Well, you asked for it [ex-parrot.com] .

Actually, I asked for it last week, in #linux on freenode. Scary huh?

Re:Email RegEx (0)

Anonymous Coward | more than 9 years ago | (#12015292)

Bah, write a verity function in PHP or something that opens a socket to the mail server/email addy specified, and ask it if it's valid. You can use the POP protocol to verify an address this way.

Re:Email RegEx (1)

ssbljk (450611) | more than 9 years ago | (#12015654)

good RegEx library can be found at http://www.regexlib.com/

A language in their own right. (0)

Sheetrock (152993) | more than 9 years ago | (#12014983)

Regular expressions are probably the first Turing-complete language to be encapsulated in another Turing-complete language (C).

Unless of course you count machine language interactions with higher-level languages they implement, but I'm not. :)

Re:A language in their own right. (1, Informative)

APDent (81994) | more than 9 years ago | (#12015147)

Regular expressions are not Turing complete.

Re:A language in their own right. (2, Informative)

smoany (832744) | more than 9 years ago | (#12015166)

Um, last time I checked, Reg. Exp's are not turing complete. Take the expression O^n 1^n, which can be made by Turing machines. If you can make that for me using a Regular Expression, you deserve a Turing Award. Regular expressions are DFA/NFA complete, not turing complete... not even close!

Re:A language in their own right. (3, Informative)

khrtt (701691) | more than 9 years ago | (#12015173)

Regular expressions are probably the first Turing-complete language to be encapsulated in another Turing-complete language (C).

Don't you just love to sound like a StarTrek character, with all that fancy terminology?

Go look up your complexity book - if you have one - regexes are not even close to Turing-complete.

REGEX (5, Funny)

null etc. (524767) | more than 9 years ago | (#12015003)

Another quibble revolves around some of the coding of the expressions. Nathan has made liberal use of the non-capturing groups (that is, (: expr )) to insure only the items that needed replacement were captured. While a worthy idea, in some cases the expression may have been simplified for understanding.

I'm not sure I understand what your quibble is - do you dislike the fact that he uses non-capturing groups, or the fact that he disposes of them at certain points?

Another issue is a slight error in searching for letters. In a number of expressions, Nathan uses [A-z] to capture all letters. Unfortunately, the special characters [, \, ], ^, _, and ` occur between upper-case Z and lower-case a, making it match too much. Either [[:alpha:]] or [A-Za-z] should have been used.

This seems like a relatively novice mistake, and I'm surprised it would show up in a book on regular expressions.

Despite these quibbles, Regular Expression Recipes does provide a useful compendium of solutions for common problems developers face. Presenting the information in a cookbook fashion, along with ensuring that those using something other than Perl don't have to sweat translating the expressions to their target language, makes this a handy book to have. I wouldn't hesitate to recommend it.

It's nice that he covers five environments for regular expressions. I'm sure everyone has heard of Mastering Regular Expressions [oreilly.com] , published by O'Reilly. The Perl Cookbook [oreilly.com] also does a good job at solving common problems with Regular expressions.

This is just my opinion, but I think what the world needs is a book on Regular Expression Design Patterns.

Re:REGEX (0)

Anonymous Coward | more than 9 years ago | (#12015317)

This is just my opinion, but I think what the world needs is a book on Regular Expression Design Patterns.

Jesus fuck, you left out all the key buzz words. Try: "Unleashing the Bible of Extreme Agile Regular Expression Design Anti-Patterns for Dummies in 24 Femtoseconds"?

Re:REGEX (0)

Anonymous Coward | more than 9 years ago | (#12015678)

No, the parent post sounds right. For example, a regex design pattern for "matching nested closing delimeters to opening delimeters" could apply to both data formats, such as HTML, or programming languages, whose opening and closing delimeters need to be parsed in order to rule out things such as escape sequences, which is a very difficult challenge in regex without parsing.

Unacceptable mistakes (5, Interesting)

gniv (600835) | more than 9 years ago | (#12015008)

In a number of expressions, Nathan uses [A-z] to capture all letters.

How can this be a good book when it makes such mistakes? If this book is for beginners (as it seems) the editing process should have been much better.

Re:Unacceptable mistakes (-1, Redundant)

Anonymous Custard (587661) | more than 9 years ago | (#12015122)

In a number of expressions, Nathan uses to capture all letters.

How can this be a good book when it makes such mistakes?


Why is [A-z] wrong, and what's the correct way to do it?

Re:Unacceptable mistakes (2, Insightful)

tehshen (794722) | more than 9 years ago | (#12015209)

[A-z] accepts all characters from A to z, including [ \ ] ^ _ and `. You want [A-Za-z] or \w (latter for 'not punctuation').

Re:Unacceptable mistakes (2, Informative)

hattmoward (695554) | more than 9 years ago | (#12015253)

\w is [A-Za-z0-9_]. The reviewer mentions use of the POSIX character class [[:alpha:]], which is more in line with what you want, and will (is supposed to) match alpha characters in non-ASCII character sets.

Re:Unacceptable mistakes (1)

tehshen (794722) | more than 9 years ago | (#12015305)

I didn't know about [[:alpha:]], thanks. \w varies between each implementation, apparently - this screenshot [regular-expressions.info] shows it matching foreign characters with accents and stuff.

Though I would use [A-Za-z0-9_] just to be on the safe side.

Re:Unacceptable mistakes (1)

lgw (121541) | more than 9 years ago | (#12015435)

As someone who's had the misfortune to work with EBCDIC, I'd point out that [[:alpha:]] is the only cross-platform answer, otherwise you can get special characters even in [A-Z], and you probably want non-ASCII alphabetics in any case.

Re:Unacceptable mistakes (1)

hankwang (413283) | more than 9 years ago | (#12015235)

Why is [A-z] wrong, and what's the correct way to do it?
$ echo '^' | grep '[A-z]' # wrong
^
$ echo '^' | grep '[A-Za-z]' # correct
$ _
In the ascii code table, the uppercase letters A-Z are followed by a number of special symbols, then followed by the lowercase letters a-z. The pattern [A-z] matches all characters that are between A and z in the ascii table, including those symbols, which is usually not what you want.

Re:Unacceptable mistakes (1)

BinLadenMyHero (688544) | more than 9 years ago | (#12015240)

There are other chars between 'Z' and 'a'.
The correct way is '[A-Za-z]'.

Re:Unacceptable mistakes (1)

roman_mir (125474) | more than 9 years ago | (#12015247)

[a-zA-Z] - this is the correct way to do it.

BTW. regular expressions present a complete Turing machine. [A-z] is wrong due to implementation of the expressions engine. They are most likely implemented in a way, that uses character 'A' as x41. Since 'Z' is x5A and 'a' is x61 there is a gap in there that would include a bunch of other characters.

Re:Unacceptable mistakes (1)

khrtt (701691) | more than 9 years ago | (#12015254)

Why is [A-z] wrong

Because there are some characters between the letters Z and a in ASCII.

what's the correct way to do it?

[A-Za-z] - for us-ascii, or
[:alpha:] - for other charsets, if your system supports it.

jeez (0, Redundant)

roman_mir (125474) | more than 9 years ago | (#12015293)

way to ask a question that would certainly cause at least posts to be moderated as 'Redundant'!

Re:jeez (0, Redundant)

roman_mir (125474) | more than 9 years ago | (#12015337)

way to ask a question that would certainly cause at least 10 posts to be moderated as 'Redundant'!

Man, I wish there was a way to edit a submitted comment.

Re:jeez (1)

Surt (22457) | more than 9 years ago | (#12015600)

That would totally change the nature of slashdot. Think about what would happen to arguments if you could go back and make little corrections to your logic/premises. You'd be able to make your responders look like fools.

phew! (0)

Anonymous Coward | more than 9 years ago | (#12015321)

I can see a lot of mod points wasted here to mark all these comments (but the first) redundant.
The problem is, you load the page, read, and by the time you reply there are already others that replied the same thing.

Re:Unacceptable mistakes (1)

Kiryat Malachi (177258) | more than 9 years ago | (#12015367)

*Technically*, [A-z] does capture all letters. It does not, however, capture *only* letters.

(Just to be pedantic.)

Re:Unacceptable mistakes (2, Informative)

Speare (84249) | more than 9 years ago | (#12015532)

No, [A-z] does not capture all letters. For example, "Å" and "é" are not usually included in the class [A-z], but it is often a part of the class \w.

Re:Unacceptable mistakes (1)

Kiryat Malachi (177258) | more than 9 years ago | (#12015648)

I don't consider those letters, you damn foreign devil.

(I kid, I kid.)

Minor variations (5, Funny)

pocari (32456) | more than 9 years ago | (#12015009)

However, I did feel a little cheated by the fact that several chapters covered essentially the same task, with only minor variations.

I can relate. I have cookbooks for food that have all these recipes that are nothing but flour, butter, eggs, and sugar. Do we need all these recipes for pancakes, cupcakes, cookies, crepes, waffles, popovers, bread, quick bread, bread sticks? Won't people figure out eventually to put a little less sugar in waffles with savory ingredients?

Japanese cookbooks are even worse. Soy sauce, sake, mirin...boooooooring!

Re:Minor variations (1)

null etc. (524767) | more than 9 years ago | (#12015061)

I can relate. I have cookbooks for food that have all these recipes that are nothing but flour, butter, eggs, and sugar. Do we need all these recipes for pancakes, cupcakes, cookies, crepes, waffles, popovers, bread, quick bread, bread sticks?

If you think there's only a minor variation between cookies and bread, let me adopt you. You'll be the easiest kid ever to take care of.

Yum, peanut butter and jelly cookies'mich.

Re:Minor variations (1)

brayniac (823642) | more than 9 years ago | (#12015185)

this is a poor analogy.you're saying that using the same ingredients for different tasks is the same as using the same basic ingredients for very similar tasks. i don't need my metaphorical cookbook to have 5 recipies for rye bread.

add this book to your list (3, Informative)

yagu (721525) | more than 9 years ago | (#12015014)

While I can't vouch for the quality of the reviewed book,if you want something definitive on regular expressions, Mastering Regular Expressions, Second Edition [amazon.com] by Jeffrey E. F. Friedl is an absolute must for your professional library. Jeffrey breaks down and then builds back up what regular expressions are and how they work, and offers an entire matrix breakout of the slightly different implementations among the most common utilities (grep, sed, awk, perl...). Not to shill for amazon, but if you select the reviewed book, the "buy this book too, and you get this great price" deal actually includes the Mastering Regular Expressions, Second Edition. . Get 'em both, you won't be sorry.

Re:add this book to your list (0)

Anonymous Coward | more than 9 years ago | (#12015217)

How much does one get for a referal such as yours?

Re:add this book to your list (1)

yagu (721525) | more than 9 years ago | (#12015327)

I wish... (hope we're far enough to be out of the modding radar....). I actually have had a recent very bad experience with amazon.... so this took a bit of a swallow to recommend this way, but the "Mastering..." is SUCH a great book... I think any professional should have at LEAST "Mastering..." as part of their library (like I said in original post, can't vouch for that book... the general reviews I've seen lead me to think it isn't nearly as good).

two problems (4, Funny)

EphemeralPhart (107572) | more than 9 years ago | (#12015026)

Some people, when confronted with a problem, think ``I know, I'll use regular expressions.'' Now they have two problems.

Jamie Zawinski

Re:two problems (1)

GerritHoll (70088) | more than 9 years ago | (#12015430)

The original post can be found here [google.nl]

Re:two problems (0)

Anonymous Coward | more than 9 years ago | (#12015520)

And some people have their head up their ass, like on how to turn a profit - like with a nightclub.

Alternatively, check out Textpipe (0)

Anonymous Coward | more than 9 years ago | (#12015054)

from http://datamystic.com/ [datamystic.com]

it has easy patterns:

http://datamystic.com/easypatterns.html [datamystic.com]

I used easy patterns in a project and the language is like an extra layer on top of regex making it simpler. Maybe the proprietary nature of easy patterns isn't great but there are some free tools that do conversions into Perl patterns.

I need something easier (0)

Anonymous Coward | more than 9 years ago | (#12015124)

Every now and then, (like once or twice a year) I can benefit from using regular expressions. It isn't worth my while to spend a lot of time learning the really arcane stuff that I need to know to use them. It's usually easier to find another way around the problem.

On the other hand, if someone produced a tool that can take any idiot (me for instance) through a step by step process that doesn't require a lot of prior knowledge and gets the job done; then I'd get really excited. For sure, I won't be reading this book; the effort will never repay itself.

Linda Richmond says... (5, Funny)

Anonymous Coward | more than 9 years ago | (#12015146)

I'm feeling a bit verklempt!

Talk amongst yourselves!

Alright, I'll give you a tawpic:

"Regular Expressions are neither regular nor expressions."

Discuss.

Re:Linda Richmond says... (0)

Anonymous Coward | more than 9 years ago | (#12015469)

Hilarious!

BTW, it's "Linda Richman".

a Cookbook eh? (3, Funny)

chiapetofborg (726868) | more than 9 years ago | (#12015187)

Anyone have any good recipies for [cookies]+ ?

Quoth Zawinski (0, Redundant)

Stavr0 (35032) | more than 9 years ago | (#12015188)

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. -- Jamie Zawinski alt.religion.emacs 1997/08/12

Common RegExp Mistake (1)

samspot (852078) | more than 9 years ago | (#12015320)

It's \. not /. =P

Regexes are overused (5, Informative)

ryantate (97606) | more than 9 years ago | (#12015263)

Anyone who drops in regularly on a Perl discussion forum (like perlmonks.org) knows that programmers tend to over-use regular expressions.

Regexes are actually a pretty poor way to extract information from comma-delimited or tab-delimited files, for example. By the time you're done dealing with escaped commas, escaped tabs, quoting characters (which many CSV and TDT exporters use in addition to commas and tabs), escaped quote characters, escaped newlines, and escaped escape chars, you end up with a super-complicated regex.

HTML is even more complicated. You have HTML comments and nested tags on top of everything else.

To validate a simple email address, Jeffrey Friedl in his Mastering Regular Expressions book for O'Reilly writes an *11-page* regex.

Most of the time the correct answer is not "here is a regex recipe" but rather "here is a simple library to do the job property with a parser", like Text::CSV or HTML::Parser in perl.

Re:Regexes are overused (2, Informative)

stratjakt (596332) | more than 9 years ago | (#12015324)

Of course, the compiled regex will likely be faster than any parsing library you write. So it all depends what you're doing.

For some sort of system that processes umpteen billion transactions per second, they can be a godsend. For parsing a .conf file once every six months when the machine is rebooted, it's a waste of time.

It's all about knowing how and when to use the tool. A pneumatic nailgun can save a carpenter hours on a jobsite, but it's a waste of time to set it all up if you only need to knock in one nailhead that's popped through the drywall.

Re:Regexes are overused (1)

ryantate (97606) | more than 9 years ago | (#12015451)

Very true. But I doubt someone who knows how to benchmark code and is handling thousands or more transactions per second is grabbing a regex recipe out of a book.

Re:Regexes are overused (1)

JoshRosenbaum (841551) | more than 9 years ago | (#12015377)

I was going to make this exact point myself as soon as I saw the words CSV/HTML/URLs. Most of these are things you should be doing with a proper module that parses the data.

They could of course be useful for simple jobs, but on the whole, you'd be a lot smarter to future proof your work, and do it the right way the first time.

If I had mod points, I'd give them to you.

-- Josh

Re:Regexes are overused (3, Insightful)

Black Perl (12686) | more than 9 years ago | (#12015431)

Yes, exactly. Any good book on Regexes should have a chapter on when NOT to use them.

I see many people trying to use regexes to do parsing, when they should be using a specialized parser.

Re:Regexes are overused (2, Informative)

smitty_one_each (243267) | more than 9 years ago | (#12015467)

Consider the boost libraries http://boost.org/ [boost.org] .

You get tokenizer, regex, and a parser library (spirit), in sorted by increasing caliber.

It's all about the right tool for the job.

Re:Regexes are overused (4, Funny)

Anonymous Coward | more than 9 years ago | (#12015541)

> *11-page* regex.

I think that's a sure sign of insanity. Or autism at the least.

Re:Regexes are overused (1)

MikeBabcock (65886) | more than 9 years ago | (#12015655)

I agree -- many parsing jobs are much simpler doing basic character-at-a-time C code, especially validation.

If you're searching for occasions of something or other in a long document, grep is obviously going to be an easy way (with regex's), but if you want to extract the hostname from a URI, just code it.

dating theory (1)

bizmark22 (823743) | more than 9 years ago | (#12015276)

I tried using regular expressions to figure out my girlfriend, she then tried dating regular guys.

F*ck this book and all others like it: (1, Informative)

stratjakt (596332) | more than 9 years ago | (#12015280)

All you need is regexlib.com and a copy of Regulator (I believe thats the free as in beer one) that will break out a regex into english steps like "capture (" "capture 3 or more 0's", and so on.. .NET has a regex facility that's slicker than greased pigeon shit, so I've been making heavy use of it lately.

Re:F*ck this book and all others like it: (1, Interesting)

yahyamf (751776) | more than 9 years ago | (#12015371)

.Net regular expressions can parse from right to left as well. Very useful sometimes

Regexen that parse from right to left (1)

jonadab (583620) | more than 9 years ago | (#12015549)

> .Net regular expressions can parse from right to left as well.
> Very useful sometimes

Yeah, especially for parsing Hebrew text. HTH.HAND.

Re:F*ck this book and all others like it: (0, Flamebait)

Anonymous Coward | more than 9 years ago | (#12015466)

Yeah, mod me down as a troll, don't even READ my comment. Or maybe I was modded down for praising something in .NET.

Whatever.

You dumb slashbot fucks have no idea what a regex is or where it's used and wouldn't know one if it was right in front of you. You probably know it's sort of a linux thing so thats good.

Sycophants and asshats, monkeys who crawl around above my office trying to figure out which wire the rats chewed through. Know-nothing idiots who ask me to unplug my cablemodem and plug it back in when I call to report that a big rig just rolled by and yanked the whole goddamned wiring bundle out of the side of my house.

BUY THIS BOOK FOR 39.99!!! Because regexes are magic and you need to give /. bn referrer money, there's absolutely no way to figure it out for free, nor is there an online library that already has dozens of examples for whatever you might need.

Fuck you and your iPods. All those white earbuds do is help me pick out the clueless wannabes. No true geek would own one.

Regex Coach helps building Regexp (5, Informative)

uss_valiant (760602) | more than 9 years ago | (#12015294)

Regex Coach [weitz.de]

This program assists you building regular expressions. I've never used it (real men code regexp at once and it works). But some friends recommend it.

Re:Regex Coach helps building Regexp (1)

GodLived (517520) | more than 9 years ago | (#12015514)

Don't you mean, Regex Crutch?

IMHO, if you're using UNIX, you gotta learn some basic regexes backward and forward. They are used in vi, grep, stream editing, and many other text utilities. If you don't innately know some basic constructs, you will be forever asking your cubicle mate - or worse, like my cube mate, using VI and clacking out "j-j-cw-foo-enter", "j-j-cw-foo-enter", "j-j-cw-foo-enter", "j-j-cw-foo-enter", ... With a tool like Regex Coach, it would make you dependent on the tool.

Now they have two problems (0, Redundant)

GerritHoll (70088) | more than 9 years ago | (#12015405)

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. (source [google.nl] )

ambiguous use of "they/them" (1)

pomakis (323200) | more than 9 years ago | (#12015422)

The vi-style regular-expression substitution technique might help: :-)

"If you spend time working writing applications that have to do pattern matches and/or replacements, you know about some of the intricacies of \(regular expressions\). For \(many people\) \1 can be an arcane hodgepodge of odd characters that somehow manage to do wonderful things, but \2 don't have enough time (or interest) to really understand how to code \1."

Different flavors? (3, Informative)

dpbsmith (263124) | more than 9 years ago | (#12015448)

In an average month, I use regular expressions as implemented in Microsoft Visual C++ 6.0, BBEdit Lite, TextWrangler, Apple MPW, and REALBasic. Every single one of them has _significant_ differences in syntax and semantics.

My understanding is that even the UNIX world sports several different flavors of regular expression in grep, egrep, fgrep, etc.

The biggest barrier to _my_ use of regular expressions is that every time I switch from one regular expression context to another, it takes me a good half hour to refresh my memory of what does and doesn't work in each environment.

Re:Different flavors? (1)

wk633 (442820) | more than 9 years ago | (#12015633)

My understanding is that even the UNIX world sports several different flavors of regular expression in grep, egrep, fgrep, etc.

Er, well, not exactly. grep, extended (egrep) and fixed (fgrep) allow for different feature/speed tradeoffs, but they are consistent in their use of regular expressions. Where you will find differences is between the regex syntax of vi, perl, sed, grep, etc.

After ten+ years, I still consult a reference for all the escape codes and such. Used to be a book, now it's google.

HTML, XML, CSV, but why? (3, Interesting)

AGTiny (104967) | more than 9 years ago | (#12015499)

Of course everyone should know how to build a regex, but why take time discussing how to parse common formats such as HTML, XML, CSV, and so on? Every language likely has a good standard module/library/package that does it all for you, hopefully in the most efficient way, and gives you an easy API. I write Perl, and have used XML::*, HTML::*, DBD::CSV, Text::CSV, the list goes on. No need to write a single regex there. Another good set of modules is Regexp::Common, giving you correct regexes for parsing semi-hard things like IP addresses, MAC addresses, phone numbers, etc.

Darn (1)

SleepyHappyDoc (813919) | more than 9 years ago | (#12015605)

I was hoping for an innovatively written cookbook for geeks (shell scripts to describe how to make a white sauce, that kinda thing). That would have made a fantastic gag gift.

Great 95 more regex's to rm (1)

my_haz (840523) | more than 9 years ago | (#12015638)

I don't think im alone in saying (having spent plenty of time on freenode #sed) that of the many regex's i have had to formulate only about 5% of them are really reuseable. Most of the time its "get the some info in file X to to File Y" or make odd file X pretty. So i could bye this book, but then i would have 95 more examples of regex's to toss out.

Free Alternative (4, Informative)

MudButt (853616) | more than 9 years ago | (#12015661)

This is free... And interactive...
http://www.regexlib.com/ [regexlib.com]
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?