Is it possible for the themoviedb scraper to ignore a prefix?

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
Mastakilla Offline
Junior Member
Posts: 25
Joined: Dec 2013
Reputation: 0
Post: #1
As not all movie series come in the correct order when sorted alphabetically (extreme example is the 26 movie collection of Zatoichi), I would like to have prefix in the filename in Windows Explorer.
However, I'm having trouble finding a way to prefix so that themoviedb scraper ignores the prefix and finds my movies.
For some series, my prefix works fine (e.g.: James Bond), but for others it does not find most movies anymore (Zatoichi, Mad Max and many more).

I have tried many different prefixes, but none work well:

1-moviename
2-moviename
3-moviename
...

or

[1] moviename
[2] moviename
[3] moviename
...

or

1979 moviename
1981 moviename
1985 moviename

So my questions:
* does anyone know a prefix that might work?
* does anyone know a hack / patch so that themoviedb scraper can ignore the prefix? (e.g.: like some regex in advancedsettings.xml)

Thanks!
(This post was last modified: 2014-02-26 12:29 by Mastakilla.)
find quote
Prof Yaffle Offline
Donor
Posts: 1,210
Joined: Mar 2011
Reputation: 28
Location: UK - in the middlish (mostly).
Post: #2
I've done variations of (1) and (2) with no problems... I've simply numbered the films, made sure the date is there, and off it went, e.g. 1. moviename [year].

You can also specify the imdb reference on a manual seatch, which solves a multitude of lookup problems.
find quote
Mastakilla Offline
Junior Member
Posts: 25
Joined: Dec 2013
Reputation: 0
Post: #3
1. Mad.Max.1979.1080p.DTS.HDMA --> not found
2. Mad.Max.2.1981.1080p.AC3.5.1.HQ --> not found
3. Mad.Max.Beyond.Thunderdome.1985.1080p.BluRay.x264-CiNEFiLE --> found

while

Mad.Max.1979.1080p.DTS.HDMA --> found
Mad.Max.2.1981.1080p.AC3.5.1.HQ --> found
Mad.Max.Beyond.Thunderdome.1985.1080p.BluRay.x264-CiNEFiLE --> found
find quote
Prof Yaffle Offline
Donor
Posts: 1,210
Joined: Mar 2011
Reputation: 28
Location: UK - in the middlish (mostly).
Post: #4
What about "1. Mad Max [1979] - DTS HDMA" or variations? I wonder if the dots are confusing things as delimiters. Or "1 - Mad.Max.....". Or "1 - Mad.Max [1979] ....".
find quote
Mastakilla Offline
Junior Member
Posts: 25
Joined: Dec 2013
Reputation: 0
Post: #5
Thanks for the suggestion!

But that kinda would mess up my entire naming convention Confused
I prefer keeping the movie names as they are... only the prefix is changeable...

I don't really feel like renaming 1500 movies today Wink

The dots work fine in all other situations (without prefix) though
(This post was last modified: 2014-02-26 15:00 by Mastakilla.)
find quote
scudlee Offline
Team-Kodi Member
Posts: 839
Joined: Jul 2011
Reputation: 51
Post: #6
This can't be done without editing the scraper.

Have a look at this thread for the basic idea.
find quote
Mastakilla Offline
Junior Member
Posts: 25
Joined: Dec 2013
Reputation: 0
Post: #7
Thanks for the tip!!

After many hours of messing around, I'm finally getting somewhere, but I'm still having issues getting it right...

I have modified my <CreateSearchUrl>, but I'm having trouble getting the regex right.
Here is a working one:
Code:
    <CreateSearchUrl dest="3">
    <RegExp input="$$1" output="\1" dest="1">
      <expression noclean="1">\[[0-9]\]_(.*)</expression>
    </RegExp>
        <RegExp input="$$1" output="&lt;url&gt;http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&amp;amp;query=\1&amp;amp;year=$$4&amp;amp;language=$INFO[language]&lt;/url&gt;" dest="3">
            <RegExp input="$$2" output="\1" dest="4">
                <expression clear="yes">(.+)</expression>
            </RegExp>
            <expression noclean="1" />
        </RegExp>
    </CreateSearchUrl>
It works for a folder like "[1]_Mad.Max.1979.1080p.DTS.HDMA"

However, I would like the following to be possible:
"[1] Mad.Max.1979.1080p.DTS.HDMA"
and
"[10] Mad.Max.1979.1080p.DTS.HDMA"

following regex do not work for allowing the space:
<expression noclean="1">\[[0-9]\] (.*)</expression>
<expression noclean="1">\[[0-9]\]\s(.*)</expression>
<expression noclean="1">\[[0-9]\]%20(.*)</expression>
<expression noclean="1">\[[0-9]\]+(.*)</expression>

following regex do not work for allowing 2 numbers:
<expression noclean="1">\[[0-9]+\]_(.*)</expression>
<expression noclean="1">\[[0-9]{1,2}\]_(.*)</expression>
<expression noclean="1">\[[0-9][0-9]*\]_(.*)</expression>

I also can't view what is being fed to buffer 1 ($$1), so it is very hard to debug...
The link to scrap on this page does not work anymore:
http://wiki.xbmc.org/index.php?title=HOW...a_scrapers

Can anyone help me out?
(This post was last modified: 2014-02-28 17:33 by Mastakilla.)
find quote
Mastakilla Offline
Junior Member
Posts: 25
Joined: Dec 2013
Reputation: 0
Post: #8
Seems like I'm not that far yet Sad

The incomplete regex that I thought was working, actually isn't working very well yet Sad Sad

<expression noclean="1">\[[0-9]\]_(.*)</expression>
recognizes
[1]_Mad.Max.1979.1080p.DTS.HDMA
but does NOT recognize
[5]_Mad.Max.1979.1080p.DTS.HDMA

I don't understand it....

Does anyone know how to display or log the input and the output of the <createsearchurl>?
(This post was last modified: 2014-02-28 17:54 by Mastakilla.)
find quote
scudlee Offline
Team-Kodi Member
Posts: 839
Joined: Jul 2011
Reputation: 51
Post: #9
If you have debug logging turned on then you should be able to see what is in $$1 buffer, as it gets passed directly as the query parameter of the URL (assuming the added clean-up regex doesn't match).

The third space regex is the one that makes sense (spaces get percent-encoded). All of the 2-number regexes look valid.
find quote
Mastakilla Offline
Junior Member
Posts: 25
Joined: Dec 2013
Reputation: 0
Post: #10
ah yes, debug does log these kinds of things... thanks!

eg:
with Regex <expression noclean="1">\[[0-9]\]_(.*)</expression>
and movie [5]_Mad.Max.1979.1080p.DTS.HDMA
Code:
17:10:08 T:8700   DEBUG: VideoInfoScanner: Scanning dir 'D:\Videos\test\Mad Max Series (NL Subbed)\[5]_Mad.Max.1979.1080p.DTS.HDMA\' as not in the database
17:10:08 T:8700   DEBUG: VideoInfoScanner: No NFO file found. Using title search for 'D:\Videos\test\Mad Max Series (NL Subbed)\[5]_Mad.Max.1979.1080p.DTS.HDMA\Mad.Max.1979.1080p.DTS.HDMA.mkv'
17:10:08 T:8700   DEBUG: ADDON::CScraper::FindMovie: Searching for '[5] Mad Max' using The Movie Database scraper (path: 'C:\Users\Mastakilla\AppData\Roaming\XBMC\addons\metadata.themoviedb.org', content: 'movies', version: '3.7.6')
17:10:08 T:8700   DEBUG: scraper: CreateSearchUrl returned <url>http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&amp;query=%5b5%5d%20mad%20max&amp;year=1979&amp;language=en</url>
17:10:08 T:8700   DEBUG: CurlFile::Open(09DFFAA8) http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&query=%5b5%5d%20mad%20max&year=1979&language=en
17:10:08 T:8700   DEBUG: scraper: GetSearchResults returned <results></results>
17:10:08 T:8700   DEBUG: ADDON::CScraper::FindMovie: Searching for '[5]_Mad.Max' using The Movie Database scraper (path: 'C:\Users\Mastakilla\AppData\Roaming\XBMC\addons\metadata.themoviedb.org', content: 'movies', version: '3.7.6')
17:10:08 T:8700   DEBUG: scraper: CreateSearchUrl returned <url>http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&amp;query=%5b5%5d_mad.max&amp;year=1979&amp;language=en</url>
17:10:08 T:8700   DEBUG: CurlFile::Open(09DFFAA8) http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&query=%5b5%5d_mad.max&year=1979&language=en
17:10:08 T:8700   DEBUG: scraper: GetSearchResults returned <results></results>
17:10:08 T:8700 WARNING: No information found for item 'D:\Videos\test\Mad Max Series (NL Subbed)\[5]_Mad.Max.1979.1080p.DTS.HDMA\Mad.Max.1979.1080p.DTS.HDMA.mkv', it won't be added to the library.
17:10:08 T:8700   DEBUG: VideoInfoScanner: No (new) information was found in dir D:\Videos\test\Mad Max Series (NL Subbed)\[5]_Mad.Max.1979.1080p.DTS.HDMA\
With the same regex and movie
[1]_Mad.Max.1979.1080p.DTS.HDMA
Code:
17:18:05 T:5784   DEBUG: VideoInfoScanner: Scanning dir 'D:\Videos\test\Mad Max Series (NL Subbed)\[1]_Mad.Max.1979.1080p.DTS.HDMA\' as not in the database
17:18:05 T:5784   DEBUG: CVideoDatabase::GetMovieId (D:\Videos\test\Mad Max Series (NL Subbed)\[1]_Mad.Max.1979.1080p.DTS.HDMA\Mad.Max.1979.1080p.DTS.HDMA.mkv), query = select idMovie from movie where idFile=7584
17:18:05 T:5784   DEBUG: VideoInfoScanner: No NFO file found. Using title search for 'D:\Videos\test\Mad Max Series (NL Subbed)\[1]_Mad.Max.1979.1080p.DTS.HDMA\Mad.Max.1979.1080p.DTS.HDMA.mkv'
17:18:05 T:5784   DEBUG: ADDON::CScraper::FindMovie: Searching for '[1] Mad Max' using The Movie Database scraper (path: 'C:\Users\Mastakilla\AppData\Roaming\XBMC\addons\metadata.themoviedb.org', content: 'movies', version: '3.7.6')
17:18:05 T:5784   DEBUG: scraper: CreateSearchUrl returned <url>http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&amp;query=%5b1%5d%20mad%20max&amp;year=1979&amp;language=en</url>
17:18:05 T:5784   DEBUG: CurlFile::Open(0B2B06F0) http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&query=%5b1%5d%20mad%20max&year=1979&language=en
17:18:05 T:5784    INFO: XCURL::DllLibCurlGlobal::easy_aquire - Created session to http://api.tmdb.org
17:18:05 T:5784   DEBUG: scraper: GetSearchResults returned <results><entity><title>Mad Max</title><id>9659</id><year>1979</year>
from this it becomes clear that my regex doesn't do ANYTHING Sad (you can see that the url still contains the prefix in both cases, even when it finds the movie)

anyone have an idea what I'm doing wrong?
(This post was last modified: 2014-02-28 18:22 by Mastakilla.)
find quote
Mastakilla Offline
Junior Member
Posts: 25
Joined: Dec 2013
Reputation: 0
Post: #11
I also just tried with the regexp within the main regexp, but still doesn't work Sad
Code:
    <CreateSearchUrl dest="3">
        <RegExp input="$$1" output="&lt;url&gt;http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&amp;amp;query=\1&amp;amp;year=$$4&amp;amp;language=$INFO[language]&lt;/url&gt;" dest="3">
            <RegExp input="$$2" output="\1" dest="4">
                <expression clear="yes">(.+)</expression>
            </RegExp>
      <RegExp input="$$1" output="\1" dest="1">
        <expression noclean="1">\[[0-9]+\]%20(.*)</expression>
      </RegExp>
            <expression noclean="1" />
        </RegExp>
    </CreateSearchUrl>
find quote
scudlee Offline
Team-Kodi Member
Posts: 839
Joined: Jul 2011
Reputation: 51
Post: #12
Looking at the output, it looks like the square brackets are also being percent-encoded, so you'd want a regex like:
Code:
<expression noclean="1">%5b[0-9]+%5d%20(.*)</expression>
find quote
Mastakilla Offline
Junior Member
Posts: 25
Joined: Dec 2013
Reputation: 0
Post: #13
good point! thanks!

but unfortunately still not working Sad

Code:
    <CreateSearchUrl dest="3">
    <RegExp input="$$1" output="\1" dest="1">
      <expression noclean="1">%5b[0-9]+%5d%20(.*)</expression>
    </RegExp>
        <RegExp input="$$1" output="&lt;url&gt;http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&amp;amp;query=\1&amp;amp;year=$$4&amp;amp;language=$INFO[language]&lt;/url&gt;" dest="3">
            <RegExp input="$$2" output="\1" dest="4">
                <expression clear="yes">(.+)</expression>
            </RegExp>
            <expression noclean="1" />
        </RegExp>
    </CreateSearchUrl>
find quote
scudlee Offline
Team-Kodi Member
Posts: 839
Joined: Jul 2011
Reputation: 51
Post: #14
Aww crap. I just tested it... I forgot about an inescapable bit of core code - underscores are always converted to spaces, but the periods are only converted to spaces if there are no actual spaces in the name, otherwise they are left as-is.

So, "[1]_Mad.Max.1979.1080p.DTS.HDMA" will get cleaned up to "[1] Mad Max" and then get percent-encoded to "%5b1%5d%20Mad%20Max" for the scraper.

Whereas "[1] Mad.Max.1979.1080p.DTS.HDMA" will get cleaned up to "[1] Mad.Max" and then get percent-encoded to "%5b1%5d%20Mad.Max".

Using the underscore, you can clean to "Mad%20Max" and get a match, but with the space you'd be left with "Mad.Max", which doesn't.

No easy way around that.

The code you posted worked for me using underscores.

Relevant lines from the debug log:
Code:
16:43:20 T:7140   DEBUG: VideoInfoScanner: No NFO file found. Using title search for 'E:\Videos\Test\[1]_Mad.Max.1979.1080p.DTS.HDMA\movie.disc'
...
16:43:20 T:8124   DEBUG: ADDON::CScraper::FindMovie: Searching for '[1] Mad Max' using The Movie Database scraper (path: 'C:\Users\ScudLee\AppData\Roaming\XBMC\addons\metadata.themoviedb.org', content: 'movies', version: '3.7.6')
16:43:20 T:8124   DEBUG: scraper: CreateSearchUrl returned <url>http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&amp;query=Mad%20Max&amp;year=1979&amp;language=en</url>
16:43:20 T:8124   DEBUG: CurlFile::Open(03776660) http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&query=Mad%20Max&year=1979&language=en
16:43:20 T:8124   DEBUG: CScraperUrl::Get: Using "UTF-8" charset for "http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&query=Mad%20Max&year=1979&language=en"
16:43:20 T:8124   DEBUG: scraper: GetSearchResults returned <results><entity><title>Mad Max</title><id>9659</id><year>1979</year><url cache="tmdb-en-9659.json">http://api.tmdb.org/3/movie/9659?api_key=57983e31fb435df4df77afb854740ea9&amp;language=en</url></entity><entity><title>Mad Max</title><id>9659</id><year>1979</year><url cache="tmdb-en-9659.json">http://api.tmdb.org/3/movie/9659?api_key=57983e31fb435df4df77afb854740ea9&amp;language=en</url></entity></results>
find quote
Mastakilla Offline
Junior Member
Posts: 25
Joined: Dec 2013
Reputation: 0
Post: #15
Thanks for that extremely crucial bit of information.
That explains a lot...

I'm now using the following (and it works!) :
Code:
    <CreateSearchUrl dest="3">
    <RegExp input="$$1" output="\1" dest="1">
      <expression noclean="1">%5b[0-9]+%5d%20(.*)</expression>
    </RegExp>
        <RegExp input="$$1" output="&lt;url&gt;http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&amp;amp;query=\1&amp;amp;year=$$4&amp;amp;language=$INFO[language]&lt;/url&gt;" dest="3">
            <RegExp input="$$2" output="\1" dest="4">
                <expression clear="yes">(.+)</expression>
            </RegExp>
            <expression noclean="1" />
        </RegExp>
    </CreateSearchUrl>

I'm using the following prefixes now
[1].Mad.Max.1979.1080p.DTS.HDMA
[2].Mad.Max.2.1981.1080p.AC3.5.1.HQ
etc

also works for multiple numbers like [11].

Thanks again for the support!
find quote