2006-06-20, 23:03
hi guys,
i seriously enjoy the imdb lookup possibility (like the rest of xbmc ) but think the imdb filename matching quality is not up to speed with the rest of xbmc (as mentioned in the imdb code itself :d )
currently, the year is sent to imdb as part of the filename, which hinders matches- while it would help in weeding out titles appearing in multiple years.
if possible, i'd suggest using a list of regexp'es matched top-down, with named groups to identify the movie name and year parts (below 'n' and 'y')- in which the filename can be matched (shortened) multiple times, and the year can be filled only once (so further regexps including the 'y' group would be skipped after a year match).
the following regexp's could possibly be used by default- they should work on 99% of files using 'scene' naming conventions, and most non-scene ones. they will keep most 'confusing' movie names like 'thx 1138' and '2001' intact. of course there are always exceptions to the rule...
assuming all whitespace (perhaps also configurable?) is already replaced with +'es, and all text is lowercase:
1. remove anything after '[':
2. remove anything after a scene tag (i think i've included most, and this will only match complete ones, so 'seoul' will not be matched by below 'se') - here split due to length:
3. split off year, if it exists. optional ( and ) are accepted.
all valid up to 2010, then a simple change will suffice
cheers,
ezd
i seriously enjoy the imdb lookup possibility (like the rest of xbmc ) but think the imdb filename matching quality is not up to speed with the rest of xbmc (as mentioned in the imdb code itself :d )
currently, the year is sent to imdb as part of the filename, which hinders matches- while it would help in weeding out titles appearing in multiple years.
if possible, i'd suggest using a list of regexp'es matched top-down, with named groups to identify the movie name and year parts (below 'n' and 'y')- in which the filename can be matched (shortened) multiple times, and the year can be filled only once (so further regexps including the 'y' group would be skipped after a year match).
the following regexp's could possibly be used by default- they should work on 99% of files using 'scene' naming conventions, and most non-scene ones. they will keep most 'confusing' movie names like 'thx 1138' and '2001' intact. of course there are always exceptions to the rule...
assuming all whitespace (perhaps also configurable?) is already replaced with +'es, and all text is lowercase:
1. remove anything after '[':
Quote:^(?p<n>.+?)\[.*$
2. remove anything after a scene tag (i think i've included most, and this will only match complete ones, so 'seoul' will not be matched by below 'se') - here split due to length:
Quote:^(?p<n>.+?)\+(ac3|custom|dc|divx|dsr|dsrip|dvd|dvdrip|dvdscr|fragment|fs|hdtv|internal|limited|
multisubs|ntsc|ogg|ogm|pal|pdtv|proper|repack|rerip|retail|se|svcd|swedish|unrated|ws|xvid)(\+.*)?$
3. split off year, if it exists. optional ( and ) are accepted.
Quote:^(?p<n>.+?)\+\(?(?p<y>(19[0-9]{2}|200[0-9]))\)?(\+.*)?$
all valid up to 2010, then a simple change will suffice
cheers,
ezd