×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

(Useful) Stupid Regex Tricks?

ScuttleMonkey posted more than 5 years ago | from the hope-you-like-reading-lots-of-random-characters dept.

Programming 516

careysb writes to mention that in the same vein as '*nix tricks' and 'VIM tricks', it would be nice to see one on regular expressions and the programs that use them. What amazingly cool tricks have people discovered with respect to regular expressions in everyday life as a developer or power user?"

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

516 comments

IP and Hardware addresses (5, Insightful)

rallymatte (707679) | more than 5 years ago | (#25703249)

To filter a string to make sure it's a valid ip address this regexp is quite useful.
/^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])$/

And this one for mac addresses
/^[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}$/

Re:IP and Hardware addresses (2)

fbjon (692006) | more than 5 years ago | (#25703429)

Are IP adresses with leading zeroes usually considered invalid?

Re:IP and Hardware addresses (1)

LordKronos (470910) | more than 5 years ago | (#25704293)

I'm not sure what the standard, but 127.0.000000.1 get to my web server in firefox, ie, safari, opera, and chrome. So it might be better to do: /^(0*(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){3}0*(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])$/

Re:IP and Hardware addresses (3, Insightful)

Poltras (680608) | more than 5 years ago | (#25703435)

So if I get this right, 0.0.0.0 is a valid ip address? I know the real regex would take a full post, but yes, it is possible to check with a single regex is it is valid, if it makes sense (127.0.0.1, 10.*, 169.254.*, etc etc) and if it's not a broadcast or a network address (not taking netmask into account).

Re:IP and Hardware addresses (3, Insightful)

plumby (179557) | more than 5 years ago | (#25704257)

So if I get this right, 0.0.0.0 is a valid ip address?

If you mean "Is it an address that you can send IP traffic to?", then the answer is no. If you mean "Is it a valid value that can end up in an IP address field (e.g., in the response to the ipconfig command)?" then the answer is yes - it means that you've not got a connection.

Re:IP and Hardware addresses (1)

tfeserver (1065880) | more than 5 years ago | (#25703641)

a "small" date verification /^(?:(?:(?:(?:[0-2][0-9]?)|(?:3[0-1]))\/(?:(?:0?[13578])|(?:1[02])))| (?:(?:(?:[0-2][0-9]?)|30)\/(?:(?:0?[469])|11))|(?:(?:(?:[0-1][0-9]?)| 2[0-8])\/(?:0?2)))\/\d{2}(?:\d{2})?\s+(?:[0-1]?[0-9]|2[0-4]):(?:[0-5]?[0-9])$/x

Re:IP and Hardware addresses (5, Informative)

Richard_J_N (631241) | more than 5 years ago | (#25703653)

Of course, you can do better still. For mac addresses, try:
    ^([[:xdigit:]]{2}:){5}[[:xdigit:]]{2}$
[:xdigit:] is short for hexadecimal digits, i.e. a-fA-F0-9
We can also loop 5 times over the 'XX:' sections.

Re:IP and Hardware addresses (5, Funny)

rallymatte (707679) | more than 5 years ago | (#25703761)

Not only are you showing off with a lower member id than me, do you also have to come up with a cooler regexp than me?

Re:IP and Hardware addresses (4, Funny)

alta (1263) | more than 5 years ago | (#25703963)

I can easily beat you on the UID, but I couldn't regex the a out of an apple.

Re:IP and Hardware addresses (4, Interesting)

nschubach (922175) | more than 5 years ago | (#25704367)

There's a really cool little "real time" regex analyzer written in Flex: (if you're not one of them scared to death by Flash content)

http://gskinner.com/RegExr/ [gskinner.com]

Maybe you can monkey your way into "regexing" the a out of apple :p

Re:IP and Hardware addresses (5, Informative)

Speare (84249) | more than 5 years ago | (#25703735)

For pretty much any useful stock problem solved by regular expressions, see Perl's Regex::Common [cpan.org] module. A lot of these patterns are fiendishly complicated to deal with edge-cases properly.

Re:IP and Hardware addresses (1)

david.given (6740) | more than 5 years ago | (#25703759)

^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])$

I'm not sure this is valid --- it doesn't accept non-dotted IP addresses, does it? i.e. expressing 127.0.0.1 as 2130706433. (Or 127.1, but which is equally, and surprisingly, valid.)

Re:IP and Hardware addresses (1)

1s44c (552956) | more than 5 years ago | (#25703839)

To filter a string to make sure it's a valid ip address this regexp is quite useful. /^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])$/

It really doesn't look like that will match all valid IPs, although it is very though. Surely 010.000.000.001
is still valid?

I normally use something like: /^\([0-2]\{0,1\}[0-9]\{1,2\}\.\)\{3\}[0-2]\{0,1\}[0-9]\{1,2\}$/
( sed style, not perl style. Perl style would of course be shorter. )

Re:IP and Hardware addresses (3, Informative)

Bazzargh (39195) | more than 5 years ago | (#25704197)

/^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])$/

Try this: /^((25[0-5]|(2[0-4]|1[0-9]|[1-9]|)[0-9])(\.|$)){4}/

And similarly: /^(([0-9a-fA-F]{2})(:|$)){6}$/

(term(delimiter|$)){n} is the generic stupid regex trick here. Works in perl, ymmv elsewhere.

-Baz

Re:IP and Hardware addresses (1)

Bazzargh (39195) | more than 5 years ago | (#25704273)

gah, obviously that matches an extra . - brainfart. /^((25[0-5]|(2[0-4]|1[0-9]|[1-9]|)[0-9])(\.|$)){4}(?!\.)/

avoids this, still without repeating the term pattern. That last bit is the perlre for a zero-width negative look-behind assertion.

-Baz

Re:IP and Hardware addresses (1)

LordKronos (470910) | more than 5 years ago | (#25704213)

And this one for mac addresses /^[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}$/

Wouldn't it be nicer if you did: /^([0-9a-fA-F]{2}:){5}[0-9a-fA-F]{2}$/

Re:IP and Hardware addresses (1)

neoform (551705) | more than 5 years ago | (#25704389)

I tried posting them right into the window, but slashdot's fantastic filters labeled the regex's as "junk characters"..

http://city17.ca/regex.txt [city17.ca]

Here's a bunch of filters I've used (in PHP).

On a side note, here's some fantastic PHP code I found:

$query_login="select * FROM user";
$result_login = mysql_query($query_login) or die("Query failed"); //$login_check = mysql_num_rows($result_login);

while($row=mysql_fetch_array($result_login))
{
$username=$row["username"];

  if ($username==$username1)
  {
        echo "";
echo "window.location.href='login_error.php?rec=qq';";
echo "";
        exit;
    }
}

Here's One for Slashdot Stories! (4, Funny)

Anonymous Coward | more than 5 years ago | (#25703303)

(Useful) Stupid * Tricks

Yes sir, that will guarantee a front page story. You better head back to the drawing board if it doesn't fit that pattern. Next week: (Useful) Stupid Starcraft Tricks.

Re:Here's One for Slashdot Stories! (4, Funny)

Malevolent Tester (1201209) | more than 5 years ago | (#25703785)

Next week: (Useful) Stupid Starcraft Tricks.

You can assign a building, building add-on, or a group of up to 12 units to a single key. To do this, select what you want to assign, then hold down Control and select a number on the keyboard between 0-9. Then, when you want to select what you assigned, simply press the number of the group that you want. Pressing a group number twice will center the screen on the group.

Re:Here's One for Slashdot Stories! (4, Funny)

McWilde (643703) | more than 5 years ago | (#25703813)

That doesn't look right...
Try:

/^\(Useful\) Stupid \w+ Tricks$/

Also, I noticed that the previous stupid tricks stories ended with a question mark, but this one doesn't. So:

/^\(Useful\) Stupid \w+ Tricks\??$/

Re:Here's One for Slashdot Stories! (1)

LordKronos (470910) | more than 5 years ago | (#25704107)

That won't do what you want in most regex flavors. What you want (at least in perl) is something more like:
\(Useful\) Stupid .+ Tricks

(feel free to wrap it in // or /^$/ if you like)

Re:Here's One for Slashdot Stories! (2, Interesting)

Talderas (1212466) | more than 5 years ago | (#25704297)

You can permanently cloak zerg units that can burrow if you control an arbiter. By burrowing the zerg unit just as it enters the arbiter's cloaking field radius, the zerg will become permanently cloaked.

New Slashot Section (5, Interesting)

Frankie70 (803801) | more than 5 years ago | (#25703329)

Maybe we should have a new section for "Useful Stupid Tricks" on Slashdot.

Re:New Slashot Section (0, Troll)

iammani (1392285) | more than 5 years ago | (#25703495)

Maybe we should have a new section for "Useful Stupid Tricks" on Slashdot.

...So that i can block "Stupid Tricks" posts from appearing on my slashdot home page

Re:New Slashot Section (0)

Anonymous Coward | more than 5 years ago | (#25703499)

Maybe we should have a new section for "Useful Stupid Tricks" on Slashdot.

+1

Please restore the universe's balance or at least kill the idle section thing.

Re:New Slashot Section (1)

VitaminB52 (550802) | more than 5 years ago | (#25703745)

Next installment: "Useful Stupid Manager Tricks"
  • If it's understandable by a manager, than it's stupid...
  • and if it gives you your much deserved salary hike, than it's useful

ARGH!!!! (2, Funny)

soapdog (773638) | more than 5 years ago | (#25703401)

You see yourself in digg.com. You are likely to be eaten by a grue.

Re:ARGH!!!! (2, Insightful)

Anonymous Coward | more than 5 years ago | (#25703885)

So clearly, Slashdot's shit never stank?

No, seriously, why the bitching? Did you expect the site to just keep reporting dry stories about incremental Linux kernel upgrades for its entire existence? You expected a website to never change and never update with the times? Just because it's old doesn't mean it's sacred.

Re:ARGH!!!! (0)

Anonymous Coward | more than 5 years ago | (#25704067)

Yes it so does.

Regex Support (2, Interesting)

Extremus (1043274) | more than 5 years ago | (#25703407)

I have used regex in the past, mainly for keeping long SQL scripts. The problem is the lack of full support for regex in most of editors. IMO the best (for windows, at least) is the EditPad Pro [editpadpro.com].

How about (3, Funny)

cbiltcliffe (186293) | more than 5 years ago | (#25703409)

Stupid (Useful) Ask Slashdot tricks?

I'm not sure whether these are legitimate, or just a "I don't know what the hell I'm doing, so let's see if I can get someone else to show me how to do my job, under the guise of sharing information."

I'd like to say the former, but my cynicism is making me lean to the latter.....

Re:How about (5, Interesting)

Anonymous Coward | more than 5 years ago | (#25703477)

I actually like these. Nice little highly enriched concentrations of geekery on a single page. Think how long it might take to round up the sort of stuff that appears here by Googling.

Turing word: insipid
In a sentence: You find this page insipid but I find it inspiring.

Re:How about (1)

cbiltcliffe (186293) | more than 5 years ago | (#25703533)

You're probably right. And there are certainly some useful nuggets on these pages, but I wouldn't be Googling for regex's anyway. That's the kind of stuff I'd write from scratch, because I want to be sure of what it does.

Maybe I'm weird that way, but I don't often take complex programming code and just copy/paste from Google. I don't trust the Internet that much.

Although that could be because I'm also a musician, and Googling lyrics/chords, etc for songs inevitably leads to some stuff where you think "Was this guy listening to the same song I am?"

Re:How about (1)

ohxten (1248800) | more than 5 years ago | (#25703539)

Lighten up, I find the comments for these "(Useful) Stupid * Tricks" stories to be very entertaining.

Regexp-based address validation (5, Informative)

mutende (13564) | more than 5 years ago | (#25703443)

Beautiful regexp that validates RFC 822 addresses: Mail-RFC822-Address.html [ex-parrot.com]

Re:Regexp-based address validation (0)

Anonymous Coward | more than 5 years ago | (#25703683)

I'm curious, could someone explain exactly what that is about? It says it's to validate if something is a valid email adress. What makes it so complex? Should it only validate if there's an @ sign and then a . to have it be a valid email address? What more complex cases are there?

Re:Regexp-based address validation (1)

maxume (22995) | more than 5 years ago | (#25703853)

Can a valid address contain more than one @ character?

If it can't, that regex, in my opinion, is really stupid, as it would be much clearer to deal with the part of the address before the @ separately from the part after the @.

Re:Regexp-based address validation (2, Informative)

Daas (620469) | more than 5 years ago | (#25704051)

I matches the entire RFC, not just the you@slashie.com .

  <You @ Slashie> you@slashie.com

Should be valid if I remember correctly.

Re:Regexp-based address validation (1)

maxume (22995) | more than 5 years ago | (#25704173)

Well, that makes more sense. It still seems a bit saner to break it up, but I don't think in regular expressions, I think in English and slowly test my way to something that works for what I am doing. It is a good thing I am not responsible for writing anything important, just this or that for my own use.

Re:Regexp-based address validation (0)

Anonymous Coward | more than 5 years ago | (#25704085)

It's been some time since I tested it but that regex fails on many addresses (both valid and invalid).

The best regex tip of them all: Never use a regex in situations that require a parser!

Mainframe Formatting (1)

jchawk (127686) | more than 5 years ago | (#25703447)

I use this to remove formatting that is included in the reports spit out from the mainframe -

cat REPORT_NAME | sed 's/[^a-z0-9,.-]//gi' > REPORT.out

It uses a few commands to accomplish this but I figured I would include the entire command line for completeness. It keeps all letters, numbers, ',', '.', and '-'. If you need other characters you can always add them to the regular expression.

Windows (3, Informative)

jgtg32a (1173373) | more than 5 years ago | (#25703451)

MS Office does support regexp while not as good as Perl regex, they are very helpful.

Link to and excel .bas addon for regexp, which helped me alot.
Don't forget to add the lib {tools->References->MS VBA Scrip regexp 5.5}

http://www.tmehta.com/regexp/using_functions.htm [tmehta.com]

Re:Windows (0)

Anonymous Coward | more than 5 years ago | (#25704403)

I use eclipse as a C++ editor (don't kill me /.). It can use regex in the search/replace, which is really handy.

is it an rfc-822 compliant e-mail address? (3, Interesting)

Anonymous Coward | more than 5 years ago | (#25703455)

please validate using the rfc and not your sketchy interpretation of an e-mail address. /.*@.*\..*/ will not cut it.

Try instead
([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x22([^\\x0d\\x22\\x5c\\x80-\\xff]|\\x5c[\\x00-\\x7f])*\\x22)(\\x2e([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x22([^\\x0d\\x22\\x5c\\x80-\\xff]|\\x5c\\x00-\\x7f)*\\x22))*\\x40([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x5b([^\\x0d\\x5b-\\x5d\\x80-\\xff]|\\x5c[\\x00-\\x7f])*\\x5d)(\\x2e([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x5b([^\\x0d\\x5b-\\x5d\\x80-\\xff]|\\x5c[\\x00-\\x7f])*\\x5d))*

See the original at http://www.iamcal.com/publish/articles/php/parsing_email/

Re:is it an rfc-822 compliant e-mail address? (0)

Anonymous Coward | more than 5 years ago | (#25703819)

Please validate using the rfc and not your sketchy interpretation of an e-mail address. ([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x22([^\\x0d\\x22\\x5c\\x80-\\xff]|\\x5c[\\x00-\\x7f])*\\x22)(\\x2e([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x22([^\\x0d\\x22\\x5c\\x80-\\xff]|\\x5c\\x00-\\x7f)*\\x22))*\\x40([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x5b([^\\x0d\\x5b-\\x5d\\x80-\\xff]|\\x5c[\\x00-\\x7f])*\\x5d)(\\x2e([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x5b([^\\x0d\\x5b-\\x5d\\x80-\\xff]|\\x5c[\\x00-\\x7f])*\\x5d))* will not cut it.

Try instead http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html [ex-parrot.com]

Honestly, unless yours is perfect, don't knock other people's as if you were god.

Re:is it an rfc-822 compliant e-mail address? (0, Troll)

feldicus (1367687) | more than 5 years ago | (#25704405)

It's things like that monstrosity that reinforce my belief that learning the ins and outs of regular expressions isn't worth the work.

feldicus

hmmmmmm (-1, Offtopic)

Anonymous Coward | more than 5 years ago | (#25703461)

Go to HKEY_CLASSES_ROOT\CLSID and search for "Bin".

Change the value to rename the Bin can =]

As everything in windows, restart is needed.

Regex Bill (5, Funny)

Anonymous Coward | more than 5 years ago | (#25703517)

Why couldn't Bill try out his regular expressions?

His mom wouldn't let him play with matches.

negative lookahead assertion (0)

Anonymous Coward | more than 5 years ago | (#25703535)

Example of using negative lookahead assertion to parse comma delimited data:

perl -pe 's/(^|,)\\\\N(?=,|\$)/\$1\$2/g'

PCRE and perl 5.10 offer "tagged" captures (1)

BrianRoach (614397) | more than 5 years ago | (#25703553)


(?:<thing>foo)

Where you can then access the matched substring ("foo" in this case) by the tag/label "thing" (access syntax depends on language).

It's pretty spiffy if you need order independent matching.

extract web address from a string? (1)

bundaegi (705619) | more than 5 years ago | (#25703563)

This what I ended-up using:

((?:http|ftp)s?://)?(((([\d]+\.)+){3}[\d]+(/[\w./]+)?)|([a-z]\w*((\.\w+)+){2,})([/][\w.~]*)*)

There may well be something more robust...

Silence is golden (1)

MikeRT (947531) | more than 5 years ago | (#25703579)

For that annoying user who never shuts up... /[.]//

What's that? Cat got your tongue, troll?

Re:Silence is golden (0)

Anonymous Coward | more than 5 years ago | (#25703747)

what do you have against periods?
Don't you mean something more like: /.//g

Match a library call number (4, Interesting)

Gulthek (12570) | more than 5 years ago | (#25703593)

Here's a chunk of perl script I wrote (years ago) that determines if $text matches any of the styles of library call number that I've ever encountered.

Slashcode is interestingly interpreting my formatting, but you should get the gist.


$text =~ /
        ^[A-Z]+ # starts with at least one capital letter
        \s? # followed by an optional space
        \d+ # followed by one or more digits /x
    or $text =~ /
        ^\d+ # starts with one or more digits
        \. # followed by a single decimal /x
    or $text =~ /
        \d+ # starts with one or more digits
        \s # and a space /x
    or $text =~ /
        Thesis # starts with "Thesis" .+ # with one or more characters of any kind
        \d{4} # then four numbers - year
        \s+ # separated by at least one space
        [A-Z]+ # from one or more capital letters
        \d+ # followed by one or more numbers /xi # case ignored here in case we run into THESIS or thesis
    or $text =~ /
        \d+ # starts with one or more digits
        \- # connected with a dash
        \d+ # to one or more following digits /x
    or $text =~ /
        \d+ # starts with one or more digits
          # followed by a space
        [A-Z]* #followed by zero or more capital letters
    \d+ # followed by one or more digits /x

Nope, not useful (4, Funny)

darkvizier (703808) | more than 5 years ago | (#25703649)

I've never found regexes to be useful at all. I prefer to write my own parsers from scratch in assembly language, or conway's game of life [wikipedia.org], if I'm feeling m/(ambitious|artistic|autistic|masochistic)/.

But even an artist gets lazy sometimes.

One for weeding out rubbish (1)

Krigl (1025293) | more than 5 years ago | (#25703655)

perl -pe '$IFS="EOF",s/^.*$//' file_with_lots_of_rubbish Works like a charm! Especially useful for dealing with marketing papers, Czech journalism and Slashdot discussions.

Apocryphal quotation (1)

Stavr0 (35032) | more than 5 years ago | (#25703763)

Some people, when confronted with a problem, think âoeI know, I'll use regular expressions.â Now they have two problems. -- Jamie Zawinski

CWEB and Doxygen (1)

N3Roaster (888781) | more than 5 years ago | (#25703775)

Here's one I came up with recently:

If you want to get documentation out of both CWEB and Doxygen, write the Doxygen comments in the source files like @=//! Comment for Doxygen.@> to prevent ctangle from stripping the comment out, then use sed 's/@=\/.*@>//g' input.w > output.w to strip those comments out so they don't end up in the output from cweave.

One regex to match them all (4, Informative)

gzipped_tar (1151931) | more than 5 years ago | (#25703803)

This regex matches a number: interger or float, scientific notation or plain, plus or minus...

[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?

Re:One regex to match them all (1)

dkf (304284) | more than 5 years ago | (#25704469)

This regex matches a number: interger or float, scientific notation or plain, plus or minus...

[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?

You've omitted the ability to match other number bases. I don't know about you, but I find it useful to be able to cope with numbers in base 2, base 8 and base 16 as well (both as integers or floats, of course).

use Regex::Common; (4, Insightful)

oneiros27 (46144) | more than 5 years ago | (#25703815)

use Regex::Common qw(URI net);
$text_with_urls =~ m/$RE{URI}/;
$text_with_ips =~ m/$RE{net}{IPv4}/;

Remove trailing whitespace (3, Interesting)

cerberusss (660701) | more than 5 years ago | (#25703851)

To remove trailing whitespace from a textfile (vim regex, don't know if the \s will work in other regex dialects):

/\s\+$//e

replace (0)

Anonymous Coward | more than 5 years ago | (#25703889)

s/bush/obama/ig

Not very complex, but ... (1)

Bob-taro (996889) | more than 5 years ago | (#25703951)

I often use sed to split a delimited line into multiple lines. E.g.:

echo $PATH | sed 's/:/\
/g'

Be lazy! (4, Interesting)

subreality (157447) | more than 5 years ago | (#25703973)

OK, you asked for stupid tricks, but this one's just plain lazy.

Between bash and grep, there are quite a lot of special characters that you have to escape... Or just ignore with dots!

/I.do.this.frequently..(even.with.parenthases).,.because.sometimes.my....backslash..key.is.tired/

A couple neat things happened: The extra dot after frequently is matching an inline paren. The paren in the PATTERN right next to it starts the mark of an atom, closed by its brother. The comma is because I put one outside the paren (here represented as the dot to the left of the comma) as is my style. Also note the literal backslash, just before you see the word backslash in hidden parenthesis.

Why not add quotes to match the spaces easily? I get a word or two in, and I find I naturally switch to using dots. These are throwaways for single tries through grep. For production code, I hone in carefully on the parts that I'm dead sure I can anchor to, escaped by any means needed, before carefully choosing my atom to match as tightly as possible, so it'll error out if my data has gone wrong.

Even in a simple case like this, half the fun is in explaining it. :)

very topical! (0)

Anonymous Coward | more than 5 years ago | (#25703977)

I've been trying to scrape a web page lately and have been trying to get a working regexp going without a huge amount of success....Anyone care to demonstrate their awesome skills by showing me how to write a regexp that will match words in data like this (or at least the first word in a string):

" 1. Word AnotherAgain "

The key point about the data is that there are words which start with exactly 1 capital letter, and may be seperated from the subsequent word by a number of spaces space or may run directly onto it. So the desired regexp would match Word or Another or Again depending which was the first in the data. There are also numbers e.g. "10." but it doesnt matter for my purposes whether they are discarded or matched as a word.

recursive regexp to match {} block (3, Informative)

doti (966971) | more than 5 years ago | (#25704037)


      my $re = '';
        $re = qr/
                \{ (?:
                        (?> [^{}]+ ) # nao-chaves
                |
                        (??{ $re }) # sub-bloco de chaves
                )* \} /xs;

email validation... (2, Interesting)

Ramley (1168049) | more than 5 years ago | (#25704047)

This was always useful when appropriate: /^[\w.|-]+@(?:[\w.|-]{2,63}\.)+[a-z]{2,6}$/ Validates a valid email address (rfc 5322) -- although not taking into account an IP address (user@192.168.1.2)

best programming editor ? (for windows) (1)

layingMantis (411804) | more than 5 years ago | (#25704071)

You're absolutely right about the crap regex support in most text editors. In my personal opinion [dankmountain.com], Dreamweaver CS3 is the best code editor I've used for regex and searching. It has standard regular expressions unlike dumbass contenders like Visual Studio or Ultraedit. It's missing some stuff, though, like named groups with named substitution, multiple line search, and (this is the worst part), the $ and ^ anchors don't seem to work. But none of that matters if your search is fucking slow (see: visual studio 2008): Dreamweaver CS3 search is very, very FAST - notepad++ simply chokes on directory searches, and all of the other lite little editors I've tried do too.

Valid Phone Number Validation that allows extensions and virtually all the common ways to list a (US) number:
/^[01]?[-\s\.]?\(?[2-9][0-9]{2}\)?([-\s\.]|(\s-\s))?[0-9]{3}([-\s\.]|(\s-\s))?[0-9]{4}\s?(([xX]\.?|(ext|EXT)\.?|\s)?\s?(?<![0-9])[0-9]{1,4})?$/

valid:

333 444 5555
1-333-444-5555
333.444-555
333444555 4444
333-444-555 ext. 123
1-(340)333 5678
333 444 555 x3456
...and lots more ways to fuck up a number (validly)

some that I've used ... (4, Interesting)

ianare (1132971) | more than 5 years ago | (#25704111)

SSN
^(?!000)([0-6]\d{2}|7([0-6]\d|7[012]))([ -]?)(?!00)\d\d\3(?!0000)\d{4}$

US phone with or without parentheses
^\([0-9]{3}\)\s?[0-9]{3}(-|\s)?[0-9]{4}$|^[0-9]{3}-?[0-9]{3}-?[0-9]{4}$

ISO Date (19th to 21st century only)
^((18|19|20)\d\d)-(0[1-9]|1[012])-(0[1-9]|1[0-9]|2[0-9]|3[01])$

Not a trick, but a question. (2, Interesting)

Janek Kozicki (722688) | more than 5 years ago | (#25704167)

I was wondering with my friend someday if it's possible with regex to select a pattern which occurs twice or more times repeatedly in single line but is separated by undefined characters. For example I want to select only lines in which the same pattern "[FB][ot]o" occurs exactly two times (in example below . is any character, for clarity):
 
...Foo... - is not selected
...Foo...Bto... - is not selected
...Bto...Bto... - is selected

a normal /[FB][ot]o.*[FB][ot]o/ would select the second and third case. But I only want the third case. The first occurrence would define my pattern, and second occurrence must exactly match it. Magic stuff like this is not working: /\([FB][ot]o\).*\1/ although that seems to be the closest description of what we wanted.

Re:Not a trick, but a question. (4, Informative)

natebarney (987940) | more than 5 years ago | (#25704453)

Magic stuff like this is not working: /\([FB][ot]o\).*\1/ although that seems to be the closest description of what we wanted.

In perl, I did /([FB][ot][o]).*\1/ and it seemed to work as you wanted. Also, if you're using a regex engine that supports lazy (non-greedy) quantifiers like perl does, I would use them in this case. It reduces backtracking. In perl, put a ? after the *.

Handy links (2, Informative)

Kozz (7764) | more than 5 years ago | (#25704185)

While I'm not providing any specific trick per say, on topic are a few useful links:
http://www.regular-expressions.info/ [regular-expressions.info] - this one is handy for regex info particularly in Javascript which I use so infrequently I need to know how to match, capture, substitute, etc.
http://perldoc.perl.org/perlre.html [perl.org] - plenty of regex info there which is Perl specific, but of course extends to many other similar implementations

Ignore whitespace (1)

Wildclaw (15718) | more than 5 years ago | (#25704229)

The number one trick for regular expressions is not a regular expression at all. It is simply the habit of always using the ignore whitespace flag to format and comment your regular expressions. Code maintenance and general readability is simply a must for any real developer.

One liners are for show, not for actual usage.

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...