How to use IMDB id from filename for scraper lookup?
#1
Question 
I want to update the movie scraper so that it uses IMDB ids that are present in the filename. If an IMDB id is recognized (tt01234567) this id should be used for the query to TheMovieDB instead of a title search.

Example:

Code:
11:13:12 T:2808 DEBUG: VideoInfoScanner: No NFO file found. Using title search for 'E:\Filme.HD\Transformers 2 - Die Rache (2009) [tt1055369].mkv'
11:13:12 T:3576 DEBUG: Thread CVideoInfoDownloader start, auto delete: 0
11:13:12 T:3576 DEBUG: ADDON::CScraper::FindMovie: Searching for 'Transformers 2 - Die Rache' using OFDB scraper (path: 'C:\Users\MediaPortal\AppData\Roaming\XBMC\addons\ metadata.ofdb.de', content: 'movies', version: '1.1.2')
11:13:12 T:3576 DEBUG: scraper: CreateSearchUrl returned <url>http://api.themoviedb.org/2.1/Movie.search/de/xml/57983e31fb435df4df77afb854740ea9/transformers%202%20%2d%20die%20rache+2009</url>
11:13:12 T:3576 DEBUG: FileCurl::Open(00FBD5E0) http://api.themoviedb.org/2.1/Movie....e%20rache+2009
11:13:12 T:3576 INFO: XCURL:llLibCurlGlobal::easy_aquire - Created session to http://api.themoviedb.org
11:13:13 T:3576 DEBUG: scraper: GetSearchResults returned <results></results>

I tried it with the following regexp, but it is not working because the id is stripped from $$1:

Code:
<CreateSearchUrl dest="3">
  <RegExp input="$$1" output="&lt;url&gt;http://api.themoviedb.org/2.1/Movie.imdbLookup/$INFO[language]/xml/57983e31fb435df4df77afb854740ea9/\1&lt;/url&gt;" dest="3">
    <expression noclean="1">\[(tt[0-9]+)\]</expression>
  </RegExp>
</CreateSearchUrl>

I can't get it to work. Unfortunately $$1 contains the "guessed" title from the filename and does not contain the [tt01234567] IMDB id any more.

$$2 contains the guessed year. But how do I get the original filename?

I tried $INFO[filename], $INFO[Listitem.FileName], $Listitem[FileName], ...

No chance and the documentation doesn't help either.
http://wiki.xbmc.org/?title=Scrapers

Can anybody please help?
Reply
#2
Star 
No sure this answer is what you want... but I can get a re-scrape to show up properly with the correct movie by using the refresh.... selecting manual, then deleting the entry.... and typing in the tt123456 or whatever code. I use IMDB search to locate the code or file name year if I have doubts.

Automating this process is I guess what you're after?
Reply
#3
Yes, I want the process to be automatic. I'm on the edge of giving up. I tried to prevent the IMDB id from being stripped with the <cleanstrings> advanced setting but that didn't work either. I really don't understand why this problem exists. The IMDB id is the best solution for a perfect scraping match without user intervention. With MediaPortal's Moving Pictures it works flawlessly.
Reply
#4
Based on the title in you debug log, cleanstrings isn't your problem... yet. Your first hurdle is cleandatetime.

cleandatetime basically assumes your file name is of the form: "Title [junk] (year) [more junk].ext" (more or less), and it splits the file name into "Title [junk]" and "year", dropping the "[more junk]" straight away.

Then cleanstrings goes to work on "Title [junk]" to turn it into "Title". And cleanstrings isn't surgical, it won't just remove matches, it'll trim the whole string from point of the first match, so even if you changed the cleanstrings to allow the "[tt012345]", if it came after any other junk, it would still get dropped.
You basically need to have a file name in the format: "Title [imdb id] [junk] (year) [more junk].ext" for the id to have any chance of reaching the scraper.
Reply
#5
Scudlee, thank you, that got me one step further!

I now have these entries in my advancedsettings.xml to just get the IMDB id over to the scraper.

Code:
    <video>
                <!-- Get IMDB id from filename into $1 for CreateSearchUrl -->
        <cleandatetime>\[(tt[0-9]+)\]</cleandatetime>
        <cleanstrings></cleanstrings>
    </video>

But it is still not working. The debug log says:

Code:
22:30:05 T:468   DEBUG: VideoInfoScanner: No NFO file found. Using title search for 'E:\Filme.HD\Zweiohrküken (2009) [tt1343755].mkv'
22:30:06 T:4020   DEBUG: Thread CVideoInfoDownloader start, auto delete: 0
22:30:06 T:4020   DEBUG: ADDON::CScraper::FindMovie: Searching for 'tt1343755' using The MovieDB scraper (path: 'C:\Users\MediaPortal\AppData\Roaming\XBMC\addons\metadata.themoviedb.org', content: 'movies', version: '3.0.6')
22:30:06 T:4020   DEBUG: scraper: CreateSearchUrl returned <url>http://api.themoviedb.org/2.1/Movie.imdbLookup/de/xml/57983e31fb435df4df77afb854740ea9/tt1343755</url>
22:30:06 T:4020   DEBUG: FileCurl::Open(003DD448) http://api.themoviedb.org/2.1/Movie.imdbLookup/de/xml/57983e31fb435df4df77afb854740ea9/tt1343755
22:30:06 T:4020    INFO: XCURL::DllLibCurlGlobal::easy_aquire - Created session to http://api.themoviedb.org
22:30:06 T:4020   DEBUG: scraper: GetSearchResults returned <results></results>
22:30:06 T:4020   DEBUG: ADDON::CScraper::FindMovie: Searching for 'tt1343755' using The MovieDB scraper (path: 'C:\Users\MediaPortal\AppData\Roaming\XBMC\addons\metadata.themoviedb.org', content: 'movies', version: '3.0.6')
22:30:06 T:4020   DEBUG: scraper: CreateSearchUrl returned <url>http://api.themoviedb.org/2.1/Movie.imdbLookup/de/xml/57983e31fb435df4df77afb854740ea9/tt1343755</url>
22:30:06 T:4020   DEBUG: FileCurl::Open(003DD448) http://api.themoviedb.org/2.1/Movie.imdbLookup/de/xml/57983e31fb435df4df77afb854740ea9/tt1343755
22:30:06 T:4020   DEBUG: scraper: GetSearchResults returned <results></results>
22:30:06 T:4020   DEBUG: Thread CVideoInfoDownloader 4020 terminating

It looks like it is not enough to just exchange the CreatSearchUrl in tmdb.xml with this new code:

Code:
    <CreateSearchUrl dest="3">
    <!--
      $$1 contains now the IMDB id  
    -->
        <RegExp input="$$1" output="&lt;url&gt;http://api.themoviedb.org/2.1/Movie.imdbLookup/$INFO[language]/xml/57983e31fb435df4df77afb854740ea9/\1&lt;/url&gt;" dest="3">
            <expression noclean="1">(.+)</expression>
        </RegExp>    
    </CreateSearchUrl>

Do I have to change GetSearchResults too? But why?
Reply
#6
Try changing the "xml" to "json" in the url.

(This is just a guess, btw)
Reply
#7
I tried it with JSON output but it didn't work. The SearchURL is created correctly and provides the movie details and that seems to be the problem. The GetSearchResults part should be skipped and GetDetails immediately be called.

I switched to the IMDB scraper now. That one even works without modifications using the IMDB id as title (with the above advancedsettings.xml modification).
Reply
#8
In case anyone is interested this works with Gotham and the Universal scraper ..

I personally think this idea is genius - almost no messing around. IMHO using a hint in the filename/foldername is simple and effective for movies that the scraper doesn't detect correctly. I would like this to be standard functionality.

For any newbies like me looking to do this, here is a step by step of what I did (all from the great work above)

1) Create "advancedsettings.xml" in XBMC userdata folder (~/Library/Application\ Support/XBMC/userdata/advancedsettings.xml in my case) containing:

<advancedsettings>
<video>
<!-- Get IMDB id from filename into $1 for CreateSearchUrl -->
<cleandatetime>\[(tt[0-9]+)\]</cleandatetime>
<cleanstrings></cleanstrings>
</video>
</advancedsettings>

2) Edit the folder name or filename - in my case I use "Movies are stored in Folder matching their names" in my scraper settings, so I edit the folder name.

MovieTitle [ttNNN] where ttNNN is the IMDB movie ID

I have not tested all cases however this simple example worked perfectly:

Nutcracker [tt0796227]
file.mkv
Reply
#9
Hi,

i have a similar problem. All my video files have a very specific file name scheme.

Example: "Star Trek 12 - Into Darkness (2013) {1408101} [1080p BluRay AVC Remux TrueHD DL] HDS.mkv"

I have tested it with a Java program, all files matched this pattern:

"\\A(.*) \\((\\d{4})\\) \\{(\\d+)\\}(?: \\[([^\\[_]*)])?(?: ([^\\[_]*))?(?: \\[(\\d+)_(\\d+)])?\\.(.+)\\Z"

1: Title
2: Year
3: IMDB ID without "tt"
4: Yunk (optional)
5: Yunk (optional)
6 & ​​7: Filepart X of Y (optional)
8: Fileextention


Now I would like to extend the analysis function, which uses it in this particular file name directly the IMDB ID.
But the previous scratching methods should not be touched.



What should I do?

Edit "XBMC\addons\metadata.themoviedb.org":
Adding a "<RegExp> ... </RegExp>" part in "<CreateSearchUrl>"

Edit/Add "%APPDATA%\XBMC\userdata\advancedsettings.xml":

Changing the "cleandatetime" setting so that not everything is ignored after a year value found.
http://wiki.xbmc.org/?title=advancedsett...andatetime

Can I enter more than one regexp parts in "cleandatetime", such as in "cleanstrings"?
http://wiki.xbmc.org/?title=advancedsett...eanstrings

Where do I get the default settings for the Advanced settings, are the information in the wiki correct?
Reply
#10
After several hours of confusion, I'm a few steps further.

I Switch to "Universal Movie Scraper" and use it instead of "The Movie Database"

To recap, my file names have the following format: "TITLE (YEAR) {IMDBID} [YUNK] YUNK [X_Y].EXT"

The variable $$1 in <CreateSearchUrl> includes: "TITLE (YEAR) {IMDBID}"

But the content of the variable is URL Encoded, so she now looks like this: "TITLE%20(YEAR)%20%%7bIMDBID%7d"

Are there several <RegExp> in <CreateSearchUrl>, the scraper uses only the last entry. Even if he fails, the others are not used.

My "XBMC\addons\metadata.universal\universal.xml" now looks like this:
Code:
...
    <CreateSearchUrl dest="3" clearbuffers="no">
        <RegExp conditional="tmdbsearch" input="$$1" output="&lt;url&gt;http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&amp;amp;query=\1&amp;amp;year=$$4&lt;/url&gt;" dest="3">
            <RegExp input="$$2" output="\1" dest="4">
                <expression clear="yes">(.+)</expression>
            </RegExp>
            <expression noclean="1" />
        </RegExp>
        <RegExp conditional="imdbsearch" input="$$1" output="&lt;url&gt;http://akas.imdb.com/find?q=\1&amp;s=tt|accept-language=en-us&lt;/url&gt;" dest="3">
            <RegExp input="$$2" output="%20(\1)" dest="4">
                <expression clear="yes">(.+)</expression>
            </RegExp>
            <RegExp input="$$1" output="\1" dest="9">
                <expression clear="yes" noclean="1"/>
            </RegExp>
            <expression noclean="1"/>
        </RegExp>

        <!-- Machted "TITLE (YEAR) {NNNNNNN}" and interpeded it as IMDB ID -->
        <RegExp conditional="imdbsearch" input="$$1" output="&lt;url&gt;http://akas.imdb.com/find?q=tt\1&amp;s=tt&ampfobar=|accept-language=en-us&lt;/url&gt;" dest="3">
            <expression clear="yes" noclean="1">.*%20\(\d{4}\)%20%7b(\d{7})%7d</expression>
        </RegExp>

    </CreateSearchUrl>
...

This works for now, now that I have slowly understood how the scraper works, I will again try it with "The Movie Database".

Edit:
My test file name had an invalid year value, so it had worked.
If given a year between 1900-2099 is specified, the cleandatetime-function removes all characters after that.
"abcdefg (2999) {0232500} [hijklmn] opqrstu [1_2].mkv"

It must in any case the cleandatetime-setting in the advancedsettings.xml to be changed to make it work.
Code:
<advancedsettings>
    <video>
        <cleandatetime>(.*)</cleandatetime>
    </video>
</advancedsettings>
Reply

Logout Mark Read Team Forum Stats Members Help
How to use IMDB id from filename for scraper lookup?2