Journal gbulmash's Journal: Regular Expressions: A Bunch of Little Ones or One Big One?
So I have a piece on my blog about detecting mobile browsers and some PHP code to do it. One part of the code checks 67 text snippets against the User Agent string, and a visitor asked if the way I was doing it -- putting all the snippets in an array and regex matching them one at a time -- was more efficient than using one giant regex with all the snippets in it.
So I actually tested it. When a matching pattern was hit very early on in the text, the giant regex was faster. But if there was no match or the match was much deeper in the text, the batch of smaller regexes performed better. Essentially, the batch of smaller regexes took the same time to run whether or not there was a match or where it occurred in the examined text. The giant regex slowed down the farther the match happened in the examined text and was slowest when there was no match.
As the length of the examined text grew, particularly with no match, the disparity between the giant regex and the batch of regexes grew. At 760 characters and no matching text, the giant regex was taking around 4.25 times longer than the batch of regexes. At 1520 characters (around 250 words), the giant regex had slipped to 5.4 times longer. Yet if there was a match in the first 10 characters of the 1520-character string, the giant regex was the same speed as if the match had come in the first 10 characters of a string 1/8th the size.
So while the giant regex would be more computationally efficient against shorter strings or ones where you knew the match would come early, the batch of many smaller regexes is actually better as a rule of thumb. It's faster against larger blocks of text when the match is deeper or non-existent, and if the text grows, it's computational needs don't grow as fast as those of the giant regex.
So I actually tested it. When a matching pattern was hit very early on in the text, the giant regex was faster. But if there was no match or the match was much deeper in the text, the batch of smaller regexes performed better. Essentially, the batch of smaller regexes took the same time to run whether or not there was a match or where it occurred in the examined text. The giant regex slowed down the farther the match happened in the examined text and was slowest when there was no match.
As the length of the examined text grew, particularly with no match, the disparity between the giant regex and the batch of regexes grew. At 760 characters and no matching text, the giant regex was taking around 4.25 times longer than the batch of regexes. At 1520 characters (around 250 words), the giant regex had slipped to 5.4 times longer. Yet if there was a match in the first 10 characters of the 1520-character string, the giant regex was the same speed as if the match had come in the first 10 characters of a string 1/8th the size.
So while the giant regex would be more computationally efficient against shorter strings or ones where you knew the match would come early, the batch of many smaller regexes is actually better as a rule of thumb. It's faster against larger blocks of text when the match is deeper or non-existent, and if the text grows, it's computational needs don't grow as fast as those of the giant regex.
Regular Expressions: A Bunch of Little Ones or One Big One? More Login
Regular Expressions: A Bunch of Little Ones or One Big One?
Slashdot Top Deals