Kodi Community Forum

Full Version: TMDB Movie Scraper not working
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Entire file

Code:
<?xml version="1.0" encoding="UTF-8"?>
<scraper framework="1.1" date="2012-01-16">
    <CreateSearchUrl dest="3">
        <RegExp input="$$1" output="&lt;url&gt;http://api.tmdb.org/3/search/movie?api_key=57983e31fb435df4df77afb854740ea9&amp;amp;query=\1&amp;amp;year=$$4&amp;amp;language=$INFO[language]&lt;/url&gt;" dest="3">
            <RegExp input="$$2" output="\1" dest="4">
                <expression clear="yes">(.+)</expression>
            </RegExp>
            <expression noclean="1" />
        </RegExp>
    </CreateSearchUrl>
    <NfoUrl dest="3">
        <RegExp input="$$1" output="&lt;details&gt;&lt;url&gt;http://api.tmdb.org/3/movie/\2?api_key=57983e31fb435df4df77afb854740ea9&amp;amp;language=$INFO[language]&lt;/url&gt;&lt;id&gt;\2&lt;/id&gt;&lt;/details&gt;" dest="3">
            <expression clear="yes" noclean="1">(themoviedb.org/movie/)([0-9]*)</expression>
        </RegExp>
        <RegExp input="$$1" output="&lt;details&gt;&lt;url cache=&quot;tmdb-$INFO[language]-tt\1.json&quot;&gt;http://api.tmdb.org/3/movie/tt\1?api_key=57983e31fb435df4df77afb854740ea9&amp;amp;language=$INFO[language]&lt;/url&gt;&lt;id&gt;tt\1&lt;/id&gt;&lt;/details&gt;" dest="3">
            <expression>imdb....?/title/tt([0-9]+)</expression>
        </RegExp>
        <RegExp input="$$1" output="&lt;details&gt;&lt;url cache=&quot;tmdb-$INFO[language]-tt\1.json&quot;&gt;http://api.tmdb.org/3/movie/tt\1?api_key=57983e31fb435df4df77afb854740ea9&amp;amp;language=$INFO[language]&lt;/url&gt;&lt;id&gt;tt\1&lt;/id&gt;&lt;/details&gt;" dest="3">
            <expression>imdb....?/Title\?t{0,2}([0-9]+)</expression>
        </RegExp>
    </NfoUrl>
<GetSearchResults dest="8">
                <RegExp input="$$3" output="&lt;results&gt;\1&lt;/results&gt;" dest="8">
                        <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\4&lt;/title&gt;&lt;id&gt;\2&lt;/id&gt;&lt;year&gt;\1&lt;/year&gt;&lt;url cache=&quot;tmdb-$INFO[language]-\1.json&quot;&gt;http://api.tmdb.org/3/movie/\2?api_key=57983e31fb435df4df77afb854740ea9&amp;amp;language=$INFO[language]&lt;/url&gt;&lt;/entity&gt;" dest="3">
                                <expression repeat="yes">&quot;overview&quot;:.*?,&quot;release_date&quot;:&quot;([0-9]+)-.*?,&quot;id&quot;:([0-9]*),&quot;original_title&quot;:&quot;([^&quot;]*)&quot;,&quot;original_language&quot;:&quot;[^&quot;]*&quot;,&quot;title&quot;:&quot;([^&quot;]*)</expression>
                        </RegExp>
                        <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\3&lt;/title&gt;&lt;id&gt;\2&lt;/id&gt;&lt;year&gt;\1&lt;/year&gt;&lt;url cache=&quot;tmdb-$INFO[language]-\1.json&quot;&gt;http://api.tmdb.org/3/movie/\1?api_key=57983e31fb435df4df77afb854740ea9&amp;amp;language=$INFO[language]&lt;/url&gt;&lt;/entity&gt;" dest="3+">
                                <expression repeat="yes">&quot;overview&quot;:.*?,&quot;release_date&quot;:&quot;([0-9]+)-,&quot;id&quot;:([0-9]*),&quot;original_title&quot;:&quot;([^&quot;]*)&quot;,&quot;original_language&quot;:&quot;[^&quot;]*&quot;</expression>
                        </RegExp>
                        <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\2&lt;/title&gt;&lt;id&gt;\1&lt;/id&gt;&lt;url cache=&quot;tmdb-$INFO[language]-\1.json&quot;&gt;http://api.tmdb.org/3/movie/\1?api_key=57983e31fb435df4df77afb854740ea9&amp;amp;language=$INFO[language]&lt;/url&gt;&lt;/entity&gt;" dest="3+">
                                <expression repeat="yes">&quot;overview&quot;:.*?,&quot;release_date&quot;:null,&quot;id&quot;:([0-9]*),&quot;original_title&quot;:&quot;([^&quot;]*)&quot;,&quot;original_language&quot;:&quot;[^&quot;]*&quot;</expression>
                        </RegExp>
                        <expression noclean="1" />
                </RegExp>
        </GetSearchResults>
    <GetDetails dest="3">
        <RegExp input="$$5" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">
            <RegExp input="$$1" output="&lt;id&gt;\1&lt;/id&gt;" dest="5">
                <expression noclean="1">&quot;id&quot;:([0-9]*),&quot;imdb</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;id&gt;\1&lt;/id&gt;" dest="5+">
                <expression clear="yes" noclean="1">&quot;id&quot;:[0-9]*,&quot;imdb_id&quot;:&quot;([^&quot;]*)</expression>
            </RegExp>
            <RegExp input="$$1" output="\1" dest="9">
                <expression fixchars="1">&quot;original_title&quot;:&quot;([^&quot;]*)</expression>
            </RegExp>
            <RegExp conditional="keeporiginaltitle" input="$$9" output="&lt;title&gt;\1&lt;/title&gt;" dest="5+">
                <expression/>
            </RegExp>
            <RegExp conditional="!keeporiginaltitle" input="$$2" output="&lt;chain function=&quot;GetTMDBTitleByIdChain&quot;&gt;$$2&lt;/chain&gt;" dest="5+">
                <expression />
            </RegExp>
            <RegExp input="$$9" output="&lt;originaltitle&gt;\1&lt;/originaltitle&gt;" dest="5+">
                <expression/>
            </RegExp>
            <RegExp input="$$1" output="&lt;year&gt;\1&lt;/year&gt;" dest="5+">
                <expression noclean="1">&quot;release_date&quot;:&quot;([0-9]+)-</expression>
            </RegExp>
            <RegExp input="$$1" output="\1" dest="10">
                <expression clear="yes" noclean="1">&quot;runtime&quot;:([0-9]+)</expression>
            </RegExp>
            <RegExp input="$$10" output="&lt;url function=&quot;ParseFallbackTMDBRuntime&quot; cache=&quot;tmdb-en-$$2.json&quot;&gt;http://api.tmdb.org/3/movie/$$2?api_key=57983e31fb435df4df77afb854740ea9&amp;amp;language=en&lt;/url&gt;" dest="5+">
                <expression>^$</expression>
            </RegExp>
            <RegExp input="$$10" output="&lt;runtime&gt;\1&lt;/runtime&gt;" dest="5+">
                <expression>(.+)</expression>
            </RegExp>
            <RegExp input="$INFO[RatingS]" output="&lt;chain function=&quot;GetIMDBRatingById&quot;&gt;$$6&lt;/chain&gt;" dest="5+">
                <RegExp input="$$1" output="\1" dest="6">
                    <expression noclean="1">&quot;id&quot;:[0-9]*,&quot;imdb_id&quot;:&quot;([^&quot;]*)</expression>
                </RegExp>
                <expression>IMDb</expression>
            </RegExp>
            <RegExp input="$INFO[RatingS]" output="&lt;chain function=&quot;GetTMDBRatingByIdChain&quot;&gt;$$2&lt;/chain&gt;" dest="5+">
                <expression>TMDb</expression>
            </RegExp>
            <RegExp input="$$2" output="&lt;chain function=&quot;GetTMDBStudioByIdChain&quot;&gt;$$2&lt;/chain&gt;" dest="5+">
                <expression />
            </RegExp>
            <RegExp input="$$2" output="&lt;chain function=&quot;GetTMDBCountryByIdChain&quot;&gt;$$2&lt;/chain&gt;" dest="5+">
                <expression />
            </RegExp>
            <RegExp input="$$2" output="&lt;chain function=&quot;GetTMDBDirectorsByIdChain&quot;&gt;$$2&lt;/chain&gt;" dest="5+">
                <expression />
            </RegExp>
            <RegExp input="$$2" output="&lt;chain function=&quot;GetTMDBWitersByIdChain&quot;&gt;$$2&lt;/chain&gt;" dest="5+">
                <expression />
            </RegExp>
            <RegExp input="$$2" output="&lt;chain function=&quot;GetTMDBCertificationsByIdChain&quot;&gt;$$2&lt;/chain&gt;" dest="5+">
                <expression />
            </RegExp>
            <RegExp input="$$2" output="&lt;chain function=&quot;GetTMDBSetByIdChain&quot;&gt;$$2&lt;/chain&gt;" dest="5+">
                <expression />
            </RegExp>
            <RegExp input="$$2" output="&lt;chain function=&quot;GetTMDBPlotByIdChain&quot;&gt;$$2&lt;/chain&gt;" dest="5+">
                <expression />
            </RegExp>
            <RegExp input="$$2" output="&lt;chain function=&quot;GetTMDBTaglineByIdChain&quot;&gt;$$2&lt;/chain&gt;" dest="5+">
                <expression />
            </RegExp>
            <RegExp input="$$2" output="&lt;chain function=&quot;GetTMDBCastByIdChain&quot;&gt;$$2&lt;/chain&gt;" dest="5+">
                <expression />
            </RegExp>
            <RegExp input="$$2" output="&lt;chain function=&quot;GetTMDBGenresByIdChain&quot;&gt;$$2&lt;/chain&gt;" dest="5+">
                <expression />
            </RegExp>
            <RegExp input="$$2" output="&lt;chain function=&quot;GetTMDBThumbsByIdChain&quot;&gt;$$2&lt;/chain&gt;" dest="5+">
                <expression />
            </RegExp>
            <RegExp conditional="fanart" input="$$2" output="&lt;chain function=&quot;GetTMDBFanartByIdChain&quot;&gt;$$2&lt;/chain&gt;" dest="5+">
                <expression />
            </RegExp>
            <RegExp conditional="trailer" input="$$2" output="&lt;chain function=&quot;GetTMDBTrailerByIdChain&quot;&gt;$$2&lt;/chain&gt;" dest="5+">
                <expression />
            </RegExp>
            <expression noclean="1" />
        </RegExp>
    </GetDetails>
    <ParseFallbackTMDBRuntime dest="5">
        <RegExp input="$$2" output="&lt;details&gt;\1&lt;/details&gt;" dest="5">
            <RegExp input="$$1" output="&lt;runtime&gt;\1&lt;/runtime&gt;" dest="2">
                <expression noclean="1">&quot;runtime&quot;:([0-9]+)</expression>
            </RegExp>
            <expression noclean="1" />
        </RegExp>
    </ParseFallbackTMDBRuntime>
</scraper>
(2015-11-13, 03:53)shiggity Wrote: [ -> ]I think y'all want this instead:
...

indeed - seems to work well on a hand-full of files I tested!
(2015-11-13, 04:02)shiggity Wrote: [ -> ]Entire file

Thanks muchly! Was rebuilding today and wondering why my movie library was missing. Seems to be working now with the manually applied patch.

Though it does beg the question as to why they are parsing JSON with a regex? Don't get me wrong, I love using regexes, but when parsing results from an API that is being updated regularly? Probably not a good idea.
With those latest changes, I still get some weirdness, though. For example, my directory "The Imitation Game (2014)" is tagged as "Bratz: Glitz 'n' Glamour (2006)" :-o
So how long does this typically take before kodi is updated to work again?
(2015-11-13, 05:33)petersmith Wrote: [ -> ]With those latest changes, I still get some weirdness, though. For example, my directory "The Imitation Game (2014)" is tagged as "Bratz: Glitz 'n' Glamour (2006)" :-o

I've had that happen before for random movies, even when everything else scanned fine. Throw an NFO file in the Imitation Game folder with the IMDB ID in it.

Eg, part of the NFO file I have for 3:10 to Yuma:

Code:
<?xml version="1.0" encoding="utf-8"?>
<movie xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <id>tt0381849</id>
</movie>

iirc, all you need is the IMDB ID and Kodi will grab the rest.

Edit: grammar
(2015-11-13, 05:38)DAlba Wrote: [ -> ]So how long does this typically take before kodi is updated to work again?

If you don't want to wait, you can manually patch the scraper for the time being using shiggity's posts above.
(2015-11-13, 05:43)baraqyal Wrote: [ -> ]
(2015-11-13, 05:38)DAlba Wrote: [ -> ]So how long does this typically take before kodi is updated to work again?

If you don't want to wait, you can manually patch the scraper for the time being using shiggity's posts above.

I would but I have no idea howSad
(2015-11-13, 05:41)baraqyal Wrote: [ -> ]Eg, part of the NFO file I have for 3:10 to Yuma:

Sweet, thanks for that example, I'll try that Big Grin
Fix for this has been just pushed to the official repo. Now you only need to be patient until your scraper addon is auto-updating itself in Kodi.

A good advice: don't mess with applying any advised changes manually from forum posts unless you REALLY know what you are doing. In case you screw it up, your scraper might not auto-update when official fix is out. I can't imagine that you can't wait a day or two. In this case it was roughly 12 hours...
@olympia: awesome, thanks for the quick fix!!
(2015-11-13, 06:12)olympia Wrote: [ -> ]Fix for this has been just pushed to the official repo. Now you only need to be patient until your scraper addon is auto-updating itself in Kodi.

A good advice: don't mess with applying any advised changes manually from forum posts unless you REALLY know what you are doing. In case you screw it up, your scraper might not auto-update when official fix is out. I can't imagine that you can't wait a day or two. In this case it was roughly 12 hours...

Nice! Is there a way to force autoupdate?
(2015-11-13, 05:06)baraqyal Wrote: [ -> ]
(2015-11-13, 04:02)shiggity Wrote: [ -> ]Entire file

Thanks muchly! Was rebuilding today and wondering why my movie library was missing. Seems to be working now with the manually applied patch.

Though it does beg the question as to why they are parsing JSON with a regex? Don't get me wrong, I love using regexes, but when parsing results from an API that is being updated regularly? Probably not a good idea.

you're welcome. feel free to up-rep me if it helped (i'll have to sneak a peek at what actually ended up getting committed but i imagine it helped at least a liiiiiiiiiiittle bit) Wink

Oh yeah, I was going to bring this up after the issue has been resolved and it sounds like it has:

JSON should *NOT* depend on order, but it seems that the historical scraper xml uses regex, which is not good for this. I don't want to be all "do this do that" but I definitely agree, an actual JSON parser in the scraper code would be awesome. I could help somewhere if I need be. I didn't get my BSCS for nothing dammit! Smile
To update manually go to:

system - add ons - my add ons - info providers - movie information - the movie database - add on information - update.

It doesn't seem to work though, my current version is 3.8.5 but no newer version listed to update to yet?
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18