re.compile benefit?
#1
Alright, I have a question!

What is the benefit of plugins using re.compile when they don't even save the regular expression object to reuse later on anyways?

As an example, I've seen repeatedly in plugins:

Code:
match = re.compile('whatever').findall(html)[0]

This can be made easier to read (in my mind) and doesn't require the creation of an unnecessary and unused regular expression object:

Code:
match = re.findall('whatever', html)[0]

So why re.compile vs re.findall?

Sorry, I was just going thru some source code and kept coming up to this.
Reply
#2
no reason. you can attribute its spreading to cut and paste i'm sure.
Reply
#3
If you need to re-use the regular expression (for example in a loop) it is faster to compile once and use compiled expression multiple.
My GitHub. My Add-ons:
Image
Reply
#4
(2012-08-30, 23:04)sphere Wrote: If you need to re-use the regular expression (for example in a loop) it is faster to compile once and use compiled expression multiple.

Yes and no. Python maintains a cache of the most recently-used regular expressions, so for things like findall() there's a cache lookup first to find an already-compiled version. IIIRC the cache size is 20.

So if you've got a few regular expressions used in a loop, and there's nothing else going on (no other threads) there will be no significant advantage to compiling first.

However, as a general rule I advise compiling most regular expressions and assigning to a "constant" (all uppercase name) for maintainability purposes - regexes can get quite complicated and moving them out-of-band can improve the readability of the code.

Code:
TEST_RE = re.compile('^.*$')

...

for line in iterable:
    m = TEST_RE.match(line)
    ...
Reply
#5
(2012-08-31, 00:33)magao Wrote: Yes and no. Python maintains a cache of the most recently-used regular expressions, so for things like findall() there's a cache lookup first to find an already-compiled version. IIIRC the cache size is 20.

I did a quick test:
PHP Code:
import timeit
import urllib2
import re

TEXT 
urllib2.urlopen('http://www.google.com').read()
EXPRESSION '<.*?>'
COUNT 1000  # 10


def without_compile():
    for 
i in xrange(COUNT):
        for 
line in TEXT.split():
            
re.match(EXPRESSIONline)


def with_compile():
    
re_compiled re.compile(EXPRESSION)
    for 
i in xrange(COUNT):
        for 
line in TEXT.split():
            
re_compiled.match(line)


def with_compile_each():
    for 
i in xrange(COUNT):
        for 
line in TEXT.split():
            
re_compiled re.compile(EXPRESSION)
            
re_compiled.match(line)


if 
__name__ == '__main__':
    print 
'Testing with %d lines, looping %d times' % (len(TEXT.split()), COUNT)
    print 
'Without compile:'
    
print timeit.Timer("without_compile()""from __main__ import without_compile").timeit(number=1)
    print 
'With compile:'
    
print timeit.Timer("with_compile()""from __main__ import with_compile").timeit(number=1)
    print 
'Without compile each time:'
    
print timeit.Timer("with_compile_each()""from __main__ import with_compile_each").timeit(number=1

Results:
Code:
Testing with 295 lines, looping 1000 times
Without compile:
0.382516145706
With compile:
0.138736963272
Without compile each time:
0.384953022003

Testing with 295 lines, looping 10 times
Without compile:
0.00669097900391
With compile:
0.00268793106079
Without compile each time:
0.00645303726196

More than twice as fast - of course not in all cases - but there are cases Smile

SCNR Wink
My GitHub. My Add-ons:
Image
Reply

Logout Mark Read Team Forum Stats Members Help
re.compile benefit?0