Modifying TVDB scraper to get Episode Thumbs from The Movie Database (TMDB)
#1
Brick 
Hi Guys,

I'm trying to modify the TVDB scraper from the Kodi repo to scrape the episode thumbnail from The Movie Database (TMDB) as a substitute, with all other data to remain being scraped using the TVDB service as normal.

I've been reading up on the very helpful Wiki guide: http://kodi.wiki/view/HOW-TO:Write_media_scrapers but unfortunately, I can't seem to make any sense of the part explaining how to get information from elsewhere (Chapter 3)

Would anyone be able to tell me how I can achieve this or, even better, post a modified version to achieve this?

In the TVDB xml, line 304 is the relevant part for episode thumbs.

I would really appreciate if anyone knowledgeable in scraper development is willing to donate a few mins of their time to advise me on exactly how to achieve this, bearing in mind I've no experience of scraper writing and base all my knowledge on this topic from reading the Wiki page.

Many thanks! WinkBig Grin


EDIT: I'll be using my own API key, so no need to worry about that part.
Reply
#2
It's a tricky little rabbit hole you have to go down to get this to work. However most of it is already written for the TMDB TV scraper, so it's not too tricky.

First obvious step, copy the <GetEpisodeArt> and <ParseEpisodeArt> functions from the TMDB scraper. Immediately realize you'll also need <ParseTMDBBaseImageURL>, so copy that too.

Now for <GetEpisodeArt> to work, it needs the TMDB series id and the season and episode numbers (id|S|EE), so you need a way to get the TMDB id using the TVDB id. Fortunately this is already written too, so just copy <GetTMDBId> for now - its output will need rewriting, though, as you'll need it to call <GetEpisodeArt> instead of the url.

The sequence of functions will then be:
Code:
<GetEpisodeDetails> -> <GetTMDBId> -> <GetEpisodeArt> -> <ParseTMDBBaseImageURL>
                                                     \-> <ParseEpisodeArt>

Since you also need the episode numbers, you'll have to stash them in spare buffers in <GetEpisodeDetails> and add a clearbuffers="no" to the whole function, so they'll get passed safely to <GetTMDBId>. Here you can just copy-paste lines 293-295 and 302-304 so they're outside their nested conditional, remove their own conditionals and replace their dests with some unused buffers (say, $$18 and $$19):
Code:
<GetEpisodeDetails clearbuffers="no" dest="3">

...

<RegExp input="$$8" output="\1" dest="18">
    <expression clear="yes">&lt;SeasonNumber&gt;([^&lt;]*)&lt;/SeasonNumber&gt;</expression>
</RegExp>
<RegExp input="$$8" output="\1" dest="19">
    <expression clear="yes">&lt;EpisodeNumber&gt;([^&lt;]*)&lt;/EpisodeNumber&gt;</expression>
</RegExp>
(It shouldn't matter too much where in <GetEpisodeDetails> you put them, so long as they're not behind any conditionals.)

Now all you need to do is replace the thumbnail lines with a URL call to <GetTMDBId> using buffer $$1 (which should still hold all the XML data), and an expression to capture the TVDB id:
Code:
<RegExp input="$$1" output="&lt;url function=&quot;GetTMDBId&quot;&gt;http://api.themoviedb.org/3/find/\1?api_key=INSERT_API_KEY&amp;amp;external_source=tvdb_id&lt;/url&gt;" dest="4+">
    <expression>&lt;Series&gt;.*?&lt;id&gt;(\d+)&lt;</expression>
</RegExp>
(Replace any API keys in the other functions too)

Then just replace the url output in <GetTMDBId> with a chain call to <GetEpisodeArt> (using the captured TMDB id, and the stashed season and episode numbers):
Code:
<RegExp input="$$7" output="&lt;chain function=&quot;GetEpisodeArt&quot;&gt;\1|$$18|$$19&lt;/chain&gt;" dest="5">

...And that should be it. I think. Very not tested.

This obviously relies heavily on the TVDB and TMDB both having the same episode ordering, although you can use the TVDB episode id to get the TMDB episode numbers (using the same URL as for <GetTMDBId>)
...Except that won't get you the TMDB series id (which you still need). You'd need to figure a way to use <GetTMDBId> and a completely new function (to parse the episode numbers) such that you can call <GetEpisodeArt> with all the necessary parts. Further and further down the rabbit hole...

(Just a warning though, the whole scraper is likely going to be updated to a complete rewrite in the near future.)
Reply
#3
Hi scudlee,

First off, I'd like to give a HUGE thank you to you for actually taking the time to explain step-by-step everything I need to do and not only that, but actually write the code. Written out very well for a noob like me. I honestly didn't expect a reply till about a month+ later Big Grin

I've followed your steps and run the scraper but the episode thumbs have not downloaded. I have a strong feeling we're very close and I'm guessing there's a slight error somewhere, so it's just a matter of finding out where it is. Looking through the code after i've followed your steps, I have a feeling it's something to do with the cache/config part in GetEpisodeArt which has been copy/pasted from TMDB:

PHP Code:
cache=&quot;tmdb-config.json&quot

but I can't say for sure. As far as I'm aware the only part used from config/cache for the purposes of GetEpisodeArt is the value of base_url which I know is"http://image.tmdb.org/t/p/".

Alternatively, it may have something to do with me understanding what you meant by:

Quote:Here you can just copy-paste lines 293-295 and 302-304 so they're outside their nested conditional, remove their own conditionals and replace their dests with some unused buffers (say, $$18 and $$19)

I wasn't sure if you meant copy/paste these lines from TVDB or TMDB. Either way, the lines seem to be mismatched so it could be something to do with the TVDB/TMDB versions we're both using Tongue I took it as meaning take out the SeasonNumber and EpisodeNumber ones from TVDB and place them outside of their conditionals.

Anyways, here's all the code I've used (with API key removed) for you to have a quick check through and tell me what may have gone wrong:

PHP Code:
    <GetEpisodeDetails clearbuffers="no" dest="3">
        
        <
RegExp input="$$8" output="\1" dest="18">
            <
expression clear="yes">&lt;SeasonNumber&gt;([^&lt;]*)&lt;/SeasonNumber&gt;</expression>
        </
RegExp>
        <
RegExp input="$$8" output="\1" dest="19">
            <
expression clear="yes">&lt;EpisodeNumber&gt;([^&lt;]*)&lt;/EpisodeNumber&gt;</expression>
        </
RegExp>

        [........]

    </
GetEpisodeDetails


PHP Code:
    <GetTMDBId dest="3">
        <
RegExp input="$$5" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">
            <
RegExp input="$$7" output="&lt;chain function=&quot;GetEpisodeArt&quot;&gt;\1|$$18|$$19&lt;/chain&gt;" dest="5">
                <
RegExp input="$$1" output="\1" dest="7">
                    <
expression noclean="1">"tv_results":\[([^\]]+)\]</expression>
                </
RegExp>
                <
expression>"id":([0-9]+)</expression>
            </
RegExp>
            <
expression noclean="1" />
        </
RegExp>
    </
GetTMDBId>
    
    <
GetEpisodeArt dest="3">
        <
RegExp input="$$5" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">
            <
RegExp input="$$1" output="&lt;url function=&quot;ParseTMDBBaseImageURL&quot; cache=&quot;tmdb-config.json&quot;&gt;http://api.themoviedb.org/3/configuration?api_key= INSERT_API_KEY&lt;/url&gt;" dest="5">
                <
expression>^([0-9]+)\|</expression>
            </
RegExp>
            <
RegExp input="$$1" output="&lt;url cache=&quot;tmdb-\1-$INFO[language]-episode-s\2e\3.json&quot; function=&quot;ParseEpisodeArt&quot;&gt;http://api.themoviedb.org/3/tv/\1/season/\2/episode/\3/images?api_key= INSERT_API_KEY&amp;amp;language=$INFO[language]&amp;amp;include_image_language=$INFO[language],en,null&lt;/url&gt;" dest="5+">
                <
expression>^([0-9]+)\|([0-9]+)\|([0-9]+)$</expression>
            </
RegExp>
            <
expression noclean="1" />
        </
RegExp>
    </
GetEpisodeArt>
    
    <
ParseTMDBBaseImageURL clearbuffers="no" dest="4">
        <
RegExp input="$$5" output="&lt;details&gt;$$20&lt;/details&gt;" dest="4">
            <
RegExp input="$$1" output="\1" dest="20">
                <
expression>"images":\{"base_url":"([^"]*)"</expression>
            </RegExp>
            <expression noclean="
1" />
        </RegExp>
    </ParseTMDBBaseImageURL>
    
    <ParseEpisodeArt dest="
4">
        <RegExp input="
$$5" output="&lt;details&gt;\1&lt;/details&gt;" dest="4">
            <RegExp input="
$$7" output="&lt;thumb&gt;$$20original\1&lt;/thumb&gt;" dest="5">
                <RegExp input="
$$1" output="\1" dest="7">
                    <expression clear="
yes">"stills":\[([^\]]*)\]</expression>
                </RegExp>
                <expression repeat="
yes">"file_path":"([^"]*)"</expression>
            </
RegExp>
            <
expression noclean="1" />
        </
RegExp>
    </
ParseEpisodeArt

Many thanks for your support, it's much appreciated.
Reply
#4
(2016-03-02, 20:51)Hustler1337 Wrote:
Quote:Here you can just copy-paste lines 293-295 and 302-304 so they're outside their nested conditional, remove their own conditionals and replace their dests with some unused buffers (say, $$18 and $$19)

I wasn't sure if you meant copy/paste these lines from TVDB or TMDB. Either way, the lines seem to be mismatched so it could be something to do with the TVDB/TMDB versions we're both using Tongue I took it as meaning take out the SeasonNumber and EpisodeNumber ones from TVDB and place them outside of their conditionals.
Now that I think about it, I probably shouldn't have gotten the line numbers from the copy of the scraper I'm constantly messing around with... (I did mean TVDB).
(2016-03-02, 20:51)Hustler1337 Wrote: Anyways, here's all the code I've used (with API key removed) for you to have a quick check through and tell me what may have gone wrong:

PHP Code:
    <GetEpisodeDetails clearbuffers="no" dest="3">
        
        <
RegExp input="$$8" output="\1" dest="18">
            <
expression clear="yes">&lt;SeasonNumber&gt;([^&lt;]*)&lt;/SeasonNumber&gt;</expression>
        </
RegExp>
        <
RegExp input="$$8" output="\1" dest="19">
            <
expression clear="yes">&lt;EpisodeNumber&gt;([^&lt;]*)&lt;/EpisodeNumber&gt;</expression>
        </
RegExp>

        [........]

    </
GetEpisodeDetails

Did you put those two RegExps as the first things in <GetEpisodeDetails>? I may have exaggerated slightly over it not mattering too much where you put them. They need to be at least after the first RegExp (which fills $$8).

If it's not that, the next steps would be to turn on debug logging, and also to check the scraper cache (on Windows, it will be %APPDATA%\Kodi\cache\scrapers\metadata.tvdb.com). When you refresh something, You should see the "tmdb-config.json" and "tmdb-[TMDB_id]-[language]-episode-s[Season]e[Episode].json" files appear. If you don't see them, that means the functions aren't getting called, if they do appear but the second one is missing the season and episode numbers, that would mean the buffers aren't getting filled.

If it's the former, open the debug log and ctrl-f GetEpisodeDetails, make sure the call to <GetTMDBId> appears in the <details>. If it does, find the line for GetTMDBId (should be just below) and see if the next function is called, etc. (You omitted the lines calling <GetTMDBId> in the code you posted, but I assume they're there in the [........]. All the other code looks fine.)

If it's the latter, try moving the two RegExps somewhere else in <GetEpisodeDetails>. Just before the call to <GetTMDBId> should be fine (or even dead last).
Reply
#5
Thanks for getting back. The two Regexs were placed somewhere down the bottom of GetEpisodeDetails so that should hopefully be fine.

I've ran the debug log and I think I've found the source of the problem thanks to your helpful guide.

Firstly, when I refresh something, there is no "tmdb-config.json" etc appearing so I've had a look at the debug log and GetTMDBId appears within <details>. Here's the results:

Code:
23:18:48 T:123145328214016   DEBUG: scraper: GetEpisodeDetails returned <?xml version="1.0" encoding="utf-8" standalone="yes"?><details><uniqueid>4663020</uniqueid><plot>Jimmy works his magic in the courtroom. Unexpected inspiration leads him to an unconventional pursuit of potential clients.</plot><credits></credits><credits>Vince Gilligan</credits><credits>Peter Gould</credits><director>Vince Gilligan</director><actor><name></name></actor><actor><name>Julie Ann Emery</name></actor><actor><name>Jeremy Shamos</name></actor><actor><name>Miriam Colon</name></actor><actor><name>Eileen Fogarty</name></actor><actor><name>Steven Levine</name></actor><actor><name>Daniel Spenser Levine</name></actor><actor><name>Raymond Cruz</name></actor><actor><name>Nadine Marissa</name></actor><actor><name>Sarah Minnich</name></actor><title>Uno</title><season></season><episode></episode><url function="GetTMDBId">http://api.themoviedb.org/3/find/273181?external_source=tvdb_id&api_key= INSERT_API_KEY </url><aired>2015-02-08</aired><rating>7.9</rating><votes>72</votes><runtime>45</runtime></details>
23:18:48 T:123145328214016   DEBUG: CurlFile::Open(0x7fc8e4a0a150) http://api.themoviedb.org/3/find/273181?external_source=tvdb_id&api_key= INSERT_API_KEY
23:18:48 T:123145328214016    INFO: easy_aquire - Created session to http://api.themoviedb.org
23:18:48 T:123145312653312  NOTICE: script.lazytvservice : 26.854080 :: 1.016178 :::  - notification!
23:18:49 T:123145328214016   DEBUG: Get: Using "UTF-8" charset for "http://api.themoviedb.org/3/find/273181?external_source=tvdb_id&api_key= INSERT_API_KEY"
23:18:49 T:123145328214016   DEBUG: scraper: GetTMDBId returned <details></details>
23:18:49 T:123145328214016   DEBUG: VideoInfoScanner: Adding new item to tvshows:/Users/Name/Desktop/TempTV/Better Call Saul (2015)/Season 1/Better.Call.Saul.S01E01.avi
[REPEATED FOR ALL OTHER EPISODES]

It looks like nothing is being parsed in GetTMDBId and GetEpisodeArt is not being called.

And yep, I completely forgot to post the code above for GetTMDBId (oops), but it is definitely in the xml file. Here's the code for that:

PHP Code:
    <GetEpisodeDetails clearbuffers="no" dest="3">
            [.....]

            <
RegExp input="$$8" output="\1" dest="18">
                <
expression clear="yes">&lt;SeasonNumber&gt;([^&lt;]*)&lt;/SeasonNumber&gt;</expression>
            </
RegExp>
            <
RegExp input="$$8" output="\1" dest="19">
                <
expression clear="yes">&lt;EpisodeNumber&gt;([^&lt;]*)&lt;/EpisodeNumber&gt;</expression>
            </
RegExp>

            [.....]

            <
RegExp input="$$1" output="&lt;url function=&quot;GetTMDBId&quot;&gt;http://api.themoviedb.org/3/find/\1?api_key= INSERT_API_KEY&amp;amp;external_source=tvdb_id&lt;/url&gt;" dest="4+">
                <
expression>&lt;Series&gt;.*?&lt;id&gt;(\d+)&lt;</expression>
            </
RegExp>
    </
GetEpisodeDetails
Reply
#6
(2016-03-02, 23:36)Hustler1337 Wrote: Thanks for getting back. The two Regexs were placed somewhere down the bottom of GetEpisodeDetails so that should hopefully be fine.

I've ran the debug log and I think I've found the source of the problem thanks to your helpful guide.

Firstly, when I refresh something, there is no "tmdb-config.json" etc appearing so I've had a look at the debug log and GetTMDBId appears within <details>. Here's the results:

Code:
23:18:48 T:123145328214016   DEBUG: scraper: GetEpisodeDetails returned <?xml version="1.0" encoding="utf-8" standalone="yes"?><details><uniqueid>4663020</uniqueid><plot>Jimmy works his magic in the courtroom. Unexpected inspiration leads him to an unconventional pursuit of potential clients.</plot><credits></credits><credits>Vince Gilligan</credits><credits>Peter Gould</credits><director>Vince Gilligan</director><actor><name></name></actor><actor><name>Julie Ann Emery</name></actor><actor><name>Jeremy Shamos</name></actor><actor><name>Miriam Colon</name></actor><actor><name>Eileen Fogarty</name></actor><actor><name>Steven Levine</name></actor><actor><name>Daniel Spenser Levine</name></actor><actor><name>Raymond Cruz</name></actor><actor><name>Nadine Marissa</name></actor><actor><name>Sarah Minnich</name></actor><title>Uno</title><season></season><episode></episode><url function="GetTMDBId">http://api.themoviedb.org/3/find/273181?external_source=tvdb_id&api_key= INSERT_API_KEY </url><aired>2015-02-08</aired><rating>7.9</rating><votes>72</votes><runtime>45</runtime></details>
23:18:48 T:123145328214016   DEBUG: CurlFile::Open(0x7fc8e4a0a150) http://api.themoviedb.org/3/find/273181?external_source=tvdb_id&api_key= INSERT_API_KEY
23:18:48 T:123145328214016    INFO: easy_aquire - Created session to http://api.themoviedb.org
23:18:48 T:123145312653312  NOTICE: script.lazytvservice : 26.854080 :: 1.016178 :::  - notification!
23:18:49 T:123145328214016   DEBUG: Get: Using "UTF-8" charset for "http://api.themoviedb.org/3/find/273181?external_source=tvdb_id&api_key= INSERT_API_KEY"
23:18:49 T:123145328214016   DEBUG: scraper: GetTMDBId returned <details></details>
23:18:49 T:123145328214016   DEBUG: VideoInfoScanner: Adding new item to tvshows:/Users/Name/Desktop/TempTV/Better Call Saul (2015)/Season 1/Better.Call.Saul.S01E01.avi
[REPEATED FOR ALL OTHER EPISODES]

First step would be to open the URL in a browser and confirm that it's returning what you want, i.e. a json object with a match to the TV show. The log would say if it was 404ing (or 401ing), so it's obviously a valid URL.

If the URL is returning what you want, then the next place for an error is the regexps in the function. One of them is not matching.
...And actually, from comparing the expected URL output to the RegExps in GetTMDBId, it looks like an error in the first RegExp - it doesn't expect there to be any other closing square brackets within the "tv_results" until the one at the end, when there actually are (in the "genre_ids" for one).
Try replacing it with:
Code:
"tv_results":\[\{([^\}]+)\}


(This is an actual bug in the TMDB scraper. Oops.)
Reply
#7
Woah, great news: it's working! Well, almost. Big Grin

Yep, the URL is definitely working as I tested it in the browser yesterday, I've little clue on understanding regex though. I've made the change to the regexp as instructed and for most of the episodes, the episode thumb is successfully shown, however there appears to be random blocks of consecutive episodes where the episode thumb is not shown.

I've had a look at the debug log for the episodes that are not showing a thumb image and it looks like the error is either a buffer or HTTP error, but I’m not sure. Here's an example of an episode which has not had it's episode thumb scraped:

Code:
10:35:20 T:123145320701952   DEBUG: CAnnouncementManager - Announcement: OnUpdate from xbmc
10:35:20 T:123145320701952   DEBUG: GOT ANNOUNCEMENT, type: 16, from xbmc, message OnUpdate
10:35:20 T:123145320701952   DEBUG: GetEpisodeId (/Users/JohnAppleseed/Desktop/TempTV/House of Cards (2013)/Season 1/House.of.Cards.S01E10.avi), query = select idEpisode from episode where idFile=4543
10:35:20 T:123145320701952   DEBUG: VideoInfoScanner: No NFO file found. Using title search for '/Users/JohnAppleseed/Desktop/TempTV/House of Cards (2013)/Season 1/House.of.Cards.S01E10.avi'
10:35:20 T:123145320701952   DEBUG: GetVideoDetails: Reading episode 'http://thetvdb.com/api/1D62F2F90030C444/series/262980/all/en.zip' using The TVDB scraper (file: '/Users/JohnAppleseed/Library/Application Support/Kodi/addons/metadata.tvdb.com', content: 'tvshows', version: '5.0.3')
10:35:20 T:123145320701952   DEBUG: scraper: GetEpisodeDetails returned <?xml version="1.0" encoding="utf-8" standalone="yes"?><details><uniqueid>4481716</uniqueid><plot>Claire fuels an old flame. Peter wrestles with his demons. Francis crosses the point of no return.</plot><credits>Sarah Treem</credits><director>Carl Franklin</director><title>Chapter 10</title><season></season><episode></episode><url function="GetTMDBId">http://api.themoviedb.org/3/find/262980?external_source=tvdb_id&api_key= INSERT_API_KEY</url><aired>2013-02-01</aired><rating>7.7</rating><votes>100</votes><runtime>60</runtime></details>
10:35:20 T:123145320701952   DEBUG: CurlFile::Open(0x7fb876e74390) http://api.themoviedb.org/3/find/262980?external_source=tvdb_id&api_key=INSERT_API_KEY
10:35:20 T:123145312653312  NOTICE: script.lazytvservice : 48.007614 :: 0.323480 :::  - notification!
10:35:20 T:123145320701952   DEBUG: Get: Using "UTF-8" charset for "http://api.themoviedb.org/3/find/262980?external_source=tvdb_id&api_key=INSERT_API_KEY"
10:35:20 T:123145320701952   DEBUG: scraper: GetTMDBId returned <details><chain function="GetEpisodeArt">1425|1|10</chain></details>
10:35:20 T:123145320701952   DEBUG: scraper: GetEpisodeArt returned <details><url function="ParseTMDBBaseImageURL" cache="tmdb-config.json">http://api.themoviedb.org/3/configuration?api_key=INSERT_API_KEY</url><url cache="tmdb-1425-en-episode-s1e10.json" function="ParseEpisodeArt">http://api.themoviedb.org/3/tv/1425/season/1/episode/10/images?api_key=INSERT_API_KEY&amp;language=en&amp;include_image_language=en,en,null</url></details>
10:35:20 T:123145320701952   DEBUG: scraper: ParseTMDBBaseImageURL returned <details>http://image.tmdb.org/t/p/</details>
10:35:20 T:123145320701952   DEBUG: CurlFile::Open(0x7fb876e74390) http://api.themoviedb.org/3/tv/1425/season/1/episode/10/images?api_key=INSERT_API_KEY&language=en&include_image_language=en,en,null
10:35:21 T:123145320701952   ERROR: CCurlFile::FillBuffer - Failed: HTTP returned error 429
10:35:21 T:123145320701952   ERROR: CCurlFile::Open failed with code 429 for http://api.themoviedb.org/3/tv/1425/season/1/episode/10/images?api_key=INSERT_API_KEY&language=en&include_image_language=en,en,null
10:35:21 T:123145320701952   ERROR: Run: Unable to parse web site
10:35:21 T:123145320701952   DEBUG: VideoInfoScanner: Adding new item to tvshows:/Users/JohnAppleseed/Desktop/TempTV/House of Cards (2013)/Season 1/House.of.Cards.S01E10.avi

And here is an example of a successful episode thumb scrape (the file scraped just before it)

Code:
10:35:20 T:123145320701952   DEBUG: CAnnouncementManager - Announcement: OnUpdate from xbmc
10:35:20 T:123145320701952   DEBUG: GOT ANNOUNCEMENT, type: 16, from xbmc, message OnUpdate
10:35:20 T:123145320701952   DEBUG: GetEpisodeId (/Users/JohnAppleseed/Desktop/TempTV/House of Cards (2013)/Season 1/House.of.Cards.S01E09.avi), query = select idEpisode from episode where idFile=4542
10:35:20 T:123145320701952   DEBUG: VideoInfoScanner: No NFO file found. Using title search for '/Users/JohnAppleseed/Desktop/TempTV/House of Cards (2013)/Season 1/House.of.Cards.S01E09.avi'
10:35:20 T:123145320701952   DEBUG: GetVideoDetails: Reading episode 'http://thetvdb.com/api/1D62F2F90030C444/series/262980/all/en.zip' using The TVDB scraper (file: '/Users/JohnAppleseed/Library/Application Support/Kodi/addons/metadata.tvdb.com', content: 'tvshows', version: '5.0.3')
10:35:20 T:123145320701952   DEBUG: scraper: GetEpisodeDetails returned <?xml version="1.0" encoding="utf-8" standalone="yes"?><details><uniqueid>4481715</uniqueid><plot>Frank tries to do whatever it takes to get the new bill passed in Congress. Russo goes on a bus campaign with the Vice President, but the VP is not making it an easy trip for him. Zoe's relationship with Frank gets a little bumpy.</plot><credits>Rick Cleveland</credits><credits>Beau Willimon</credits><director>James Foley</director><title>Chapter 9</title><season></season><episode></episode><url function="GetTMDBId">http://api.themoviedb.org/3/find/262980?external_source=tvdb_id&api_key=INSERT_API_KEY</url><aired>2013-02-01</aired><rating>7.8</rating><votes>91</votes><runtime>60</runtime></details>
10:35:20 T:123145320701952   DEBUG: CurlFile::Open(0x7fb876e743b0) http://api.themoviedb.org/3/find/262980?external_source=tvdb_id&api_key=INSERT_API_KEY
10:35:20 T:123145312653312  NOTICE: script.lazytvservice : 47.684134 :: 0.182123 :::  - notification!
10:35:20 T:123145320701952   DEBUG: Get: Using "UTF-8" charset for "http://api.themoviedb.org/3/find/262980?external_source=tvdb_id&api_key=INSERT_API_KEY"
10:35:20 T:123145320701952   DEBUG: scraper: GetTMDBId returned <details><chain function="GetEpisodeArt">1425|1|9</chain></details>
10:35:20 T:123145320701952   DEBUG: scraper: GetEpisodeArt returned <details><url function="ParseTMDBBaseImageURL" cache="tmdb-config.json">http://api.themoviedb.org/3/configuration?api_key=INSERT_API_KEY</url><url cache="tmdb-1425-en-episode-s1e9.json" function="ParseEpisodeArt">http://api.themoviedb.org/3/tv/1425/season/1/episode/9/images?api_key=INSERT_API_KEY&amp;language=en&amp;include_image_language=en,en,null</url></details>
10:35:20 T:123145320701952   DEBUG: scraper: ParseTMDBBaseImageURL returned <details>http://image.tmdb.org/t/p/</details>
10:35:20 T:123145320701952   DEBUG: CurlFile::Open(0x7fb876e743b0) http://api.themoviedb.org/3/tv/1425/season/1/episode/9/images?api_key=INSERT_API_KEY&language=en&include_image_language=en,en,null
10:35:20 T:123145320701952   DEBUG: Get: Using "UTF-8" charset for "http://api.themoviedb.org/3/tv/1425/season/1/episode/9/images?api_key=INSERT_API_KEY&language=en&include_image_language=en,en,null"
10:35:20 T:123145320701952   DEBUG: scraper: ParseEpisodeArt returned <details><thumb>http://image.tmdb.org/t/p/original/zgZg8gzvDa0yqvagURKSA706FvX.jpg</thumb><thumb>http://image.tmdb.org/t/p/original/5O2VgTQ10iDPBEhi2xlBW2EUnCM.jpg</thumb><thumb>http://image.tmdb.org/t/p/original/gufDu1Qoz0sDBD3MpMc2MGN11rf.jpg</thumb></details>
10:35:20 T:123145320701952   DEBUG: VideoInfoScanner: Adding new item to tvshows:/Users/JohnAppleseed/Desktop/TempTV/House of Cards (2013)/Season 1/House.of.Cards.S01E09.avi

It looks like after it hit the 10th episode, it stopped scraping episode thumbs and then resumed again after 11 further episodes (crossing over onto the next season) (if that’s at all relevant or means something). Also, it’s worth mentioning that if I scrape a single TV show again by ‘refresh’ or a single episode by ‘refresh’, the episode thumb scrapes successfully, so it could be buffer related.

I don’t mind individually refreshing the episodes with missing thumbs as a partially working scraper is good enough for me, even if it means I’ve got to manually scrape some episodes. Squashing that pesky bug however would be the icing on the cake Tongue

Thanks scudlee

EDIT: Actually, after a quick Google for the HTTP 429 error, it appears i'm exceeding TMDB's API Request Rate limit which explains why I get a blackout of results.
Reply
#8
(2016-03-03, 13:18)Hustler1337 Wrote: EDIT: Actually, after a quick Google for the HTTP 429 error, it appears i'm exceeding TMDB's API Request Rate limit which explains why I get a blackout of results.
Weird that I've never seen anyone complain about this when using the actual TMDB scraper, and that downloads a crap-ton of files per show. I even worried about the possibility when I wrote it.

There's no real way to slow the scraper down properly... However I did have a similar issue with the AniDb scraper mod, which I tried to get around by having a function that just does some "busywork" to kill time.

You can see the function on GitHub.

To use it, just copy it into your scraper and replace the $INFO[DelayValue] with a hard-coded number - no point messing around adding a new setting, just remember to change all three occurrences to the same number (try 100 to start), and then add:
Code:
<RegExp input="" output="&lt;chain function=&quot;StartDelayLoop&quot;&gt;&amp;&lt;/chain&gt;" dest="4+">
    <expression/>
</RegExp>
into <GetEpisodeDetails> (probably better after the call to <GetTMDBId> in case it messes with the buffers).

Might work, might not.
Reply
#9
Thanks for the reply scudlee IT WORKS!! Big Grin Big Grin Big Grin

Yeah, I was kinda surprised to realise what the HTTP 429 error meant because I've never experienced this when scanning my entire library from fresh. I tried to look for a delay code in the TVDB and TMDB addons straight after my previous post but couldn't find anything and you've confirmed that there's no such thing.

Initially, when I tried your delay code, it didn't seem to call it when I put the call in right at the bottom of <GetEpisodeDetails> . After a trial and error, it seems to be working somewhere in the middle of <GetEpisodeDetails> but after the call to <GetTMDBId> as advised.

100ms and 150ms didn't seem to work, but 200ms delay seems to be doing the job for now.

Fingers crossed this should work for my entire library as I've been testing your code on only two TV shows.

Once again, I'd like to express my gratitude for your help. Taking the time to write a clear, well-written guide, explaining what each step does and actually writing the code is very much appreciated and it's definitely not something I've overlooked. Definitely helps to have someone very knowledgeable in this field to give some well-informed advice . I honestly didn't expect any replies to this thread for a long time and kinda expected some half-hearted replies. Hopefully this is beneficial for others as much as it has been to me. Big Grin

Thank you!

EDIT: I'll update this thread and let you know how it goes with a full library scan.
Reply
#10
Does this still work for you? If so, do you mind sharing it because I like using TMDB for the higher res thumbs but it means I can't then use Extended Info script functions that need the TVDB id.

Thanks.
Reply

Logout Mark Read Team Forum Stats Members Help
Modifying TVDB scraper to get Episode Thumbs from The Movie Database (TMDB)0