IMDB filename/year regexp matching (sample inside)
#1
hi guys,

i seriously enjoy the imdb lookup possibility (like the rest of xbmc Smile ) but think the imdb filename matching quality is not up to speed with the rest of xbmc (as mentioned in the imdb code itself :d )

currently, the year is sent to imdb as part of the filename, which hinders matches- while it would help in weeding out titles appearing in multiple years.

if possible, i'd suggest using a list of regexp'es matched top-down, with named groups to identify the movie name and year parts (below 'n' and 'y')- in which the filename can be matched (shortened) multiple times, and the year can be filled only once (so further regexps including the 'y' group would be skipped after a year match).

the following regexp's could possibly be used by default- they should work on 99% of files using 'scene' naming conventions, and most non-scene ones. they will keep most 'confusing' movie names like 'thx 1138' and '2001' intact. of course there are always exceptions to the rule...

assuming all whitespace (perhaps also configurable?) is already replaced with +'es, and all text is lowercase:

1. remove anything after '[':
Quote:^(?p<n>.+?)\[.*$

2. remove anything after a scene tag (i think i've included most, and this will only match complete ones, so 'seoul' will not be matched by below 'se') - here split due to length:
Quote:^(?p<n>.+?)\+(ac3|custom|dc|divx|dsr|dsrip|dvd|dvdrip|dvdscr|fragment|fs|hdtv|internal|limited|
multisubs|ntsc|ogg|ogm|pal|pdtv|proper|repack|rerip|retail|se|svcd|swedish|unrated|ws|xvid)(\+.*)?$

3. split off year, if it exists. optional ( and ) are accepted.
Quote:^(?p<n>.+?)\+\(?(?p<y>(19[0-9]{2}|200[0-9]))\)?(\+.*)?$

all valid up to 2010, then a simple change will suffice Smile

cheers,

ezd
Reply
#2
this sounds at least on the surface to be a great idea.

there's a test scraper app available in cvs (in the tools/) folder that doesn't need the xdk or anything.

how about trying out your ideas on that and seeing how it goes?

cheers,
jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#3
glad you like it!

i don't have visual studio at the moment, but my python test code sorted out my folders pretty effectively. i'll see if i can dig up a copy i can use. would visual c++ 2005 express edition do?

(edit: tried that and it unfortunately doesn't seem to work under vs2005...)



Reply
#4
done.

i've uploaded a patch to nightfalltech, not user-configurable yet but working quite a bit better as the old procedures in my tests.

unfortunately the regexp engine inside xbmc appears greedy-only, which limited the usability of the regexp- currently the tag-removal regexp is used repeatedly (the year-splitter is used once).

hope you like it,

ezd
Reply

Logout Mark Read Team Forum Stats Members Help
IMDB filename/year regexp matching (sample inside)0