Kodi Community Forum

Full Version: Questions around XBMC specific REGEX
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Posted here as i could not decide which forum was appropriate. Feel free to move.

I have some questions around XBMC specific implementation of REGEX.

1. I was certain XBMC didnt support advanced metacharacters e.g. \d

However this wiki page is full of them:

http://wiki.xbmc.org/?title=Regular_Expr...)_Tutorial

Is this just a work in progress.? If so it really should be given a big sign as almost every example is not appropriate for XBMC?

Edit: If XBMC really is fully PCRE compatible these examples are perfect.
Edit: Confirmed XBMC does support REGEX meta chars it never used to such as \d



2. I thought that TV show matching only needed two () matches; one for season and one for episode.

However the default set listed on the wiki here:

http://wiki.xbmc.org/?title=Advancedsett...atching.3E

e.g. <regexp>\[[Ss]([0-9]+)\]_\[[Ee]([0-9]+)([^\\/]*)</regexp> <!-- foo_[s01]_[e01] -->

has 3 matches. Why 3 group matches?

Edit: Confirmed via here and IRC (thanks cpt) that the 3rd macth is for two part ep matching.



3. This REGEX is listed all over the place at the end of most TV matching regex [^\\/]*

I understand what it means but why is such a greedy match like this needed?

Edit: Explained by answer 2

4. Several forum examples by XBMC devs use case sensitive matching e.g [a-zA-Z].

I was under the impression that strings were converted to lower case by the code prior to being REGEX matched?

Edit: Over time XBMC has been improved. Now you dont need to care about case at all since everything is forced to lower case prior to a match

5. Can someone point me at an explanation of two part matching? I was sure there was a guide somewhere but i just cant find it now.

Edit: The only real docs on this are in the source comments
Just a quick tip.

There is a tool that can test regular expressions called kik, it's available in ubuntu repositories.

I'm not sure but I believe this is the one: http://code.google.com/p/kiki-re/ and it has win32 downloads.
Thanks for the tip. I actually use RegexBuddy which i find quite comprehensive.

I dont really have a problem constructing actual REGEX its more the XBMC constraints and specific implementation + discrepancies in examples thats the problem.
IIRC XBMC uses pcre now aswell, wich consist of a new set of expressions.

http://en.wikipedia.org/wiki/PCRE

But I really am a novice with regexpressions, it's all magic to me Smile
Topfs2 Wrote:IIRC XBMC uses pcre now aswell, which consist of a new set of expressions.

http://en.wikipedia.org/wiki/PCRE

But I really am a novice with regexpressions, it's all magic to me Smile

Thats very interesting indeed. In theory this means we can replace all the fixed ranges with built in functions i.e. [0-9] with \d etc
Feel free to give it a go and see what happens.

As for the third (), it's for the multi-episode stuff - it allows you to specify the portion of the string to run the third regexp on. At least I *think* that's what it's for Smile

Cheers,
Jonathan
A small update:

XBMC does indeed support REGEX metachars it never used to e.g. \d

this is excellent if for no other reason I don't have to remember XBMC REGEX limitations.

Can anyone else help with definitive statements on the other questions. I have some guesses but I no longer want to guess before I start this work.

ta
xbmc can do perl compatible regexp these days, i.e. we use libpcre.
thanks spiff.

Can you clear up my confusion.

From the debug log (obfuscated slightly):


12:00:46 T:712 M:2610446336 DEBUG: found match smb://server/tv/u/the unit/season 2/the.unit.s02e03.dvdrip.xvid-sort.avi (s02e03) [[Ss]([0-9]+)[\.-]?[Ee]([0-9]+)([^\\/]*)$]

The actual filename is: The.Unit.S02E03.DVDRip.XviD-SORT.avi

So something doesn't addup here. The REGEX is case sensitive, but the log entry seems to suggest the matching text is forced to lowercase first.?

Also is the third greedy group match ([^\\/]*) something to do with two part episode matching i.e. it captures this test which is then fed to a second regex?
yes, we lowercase everything prior to running the expressions on them. it was a poor man's case insensitive matching while we didn't have the option (now we can just put a flag in the parser). yes, the third match is what we will continue matching on in case of multi-episode files.