• 1
  • 11
  • 12
  • 13(current)
  • 14
  • 15
  • 42
Release TheMovieDB - TV Show scraper (XML)
(2018-11-21, 14:51)docwra Wrote: The music scraper already does this I believe, maybe @DaveBlake knows how its specifically done.
Yes because most music scrapers are based on accessing Musicbrainz site first to identify the artist or album, and Musicbrainz rejects any requests from an IP address that come in faster than 1 per second, the music scraping is throttled to that rate. It turned out that the original throttling was faulty in v16 and before (and we were hitting 503 errors) so I fixed it, but it was part of the original design.

Looking quickly at video scraping (something I know nothing about) it does not seem to have any throttling, hence it could be easy for a fast processor to hit the 40 requests per 10 seconds limit. VideoInfoScanner.cpp is the code to look at, and compare to MusicInfoScanner.cpp for hints.
Reply
@olympia and @scudlee I spoke with TMDB and apparently there is an API method that returns all the episode info with a single call

Code:
https://api.themoviedb.org/3/tv/1402/season/1?api_key=###

Will return an episodes array containing all of the episodes of the season for example. So they should be able to get away with only calling a single HTTP call for all of the episodes in a season.
Reply
I've done a quick rewrite to use the season lists for the episode details, rather than pull each episode individually:
https://pastebin.com/MVEC6R7q
If you can replace the tmdb.xml in metadata.tvshows.themoviedb.org with contents of that pastebin and give it a whirl, let me know if it helps. 
It should reduce the number of calls per show to the number of seasons plus 1 (plus another 1 for the entire scrape).

(It also contains my Episode Group code, but that shouldn't affect anything, I was just too lazy to revert back before making the changes.)
Reply
(2018-11-21, 18:19)scudlee Wrote:
(2018-11-21, 16:05)axlt2002 Wrote: Ok, I gave a look to my log posted here and made a very rough calculation based on the curl requests to http://api.themoviedb.org/..., for example:
This are the relevant data:
 
  • First curl at: 17:34:56
  • Error 429 at: 17:35:06
  • Time interval: around 10 seconds
  • Number of curls in that interval: 41

So we are really playing around the TMDB API rate limit of 40 requests every 10 seconds...      
 If you turn on component-specific logging in the Kodi system settings, and select verbose logging for the libcURL library, you'll be able to see the Rate Limit in the returned headers in the debug log.

e.g.
Code:
16:13:18.911 T:8164 DEBUG: Curl::Debug - HEADER_IN: X-RateLimit-Limit: 40
16:13:18.912 T:8164 DEBUG: Curl::Debug - HEADER_IN: X-RateLimit-Remaining: 23
16:13:18.912 T:8164 DEBUG: Curl::Debug - HEADER_IN: X-RateLimit-Reset: 1542816811
 
Thanks a lot for the hint @scudlee! So this is the result that confirms my rough calculation:

Code:
08:39:46.313 T:7512   ERROR: CCurlFile::Open failed with code 429 for http://api.themoviedb.org/3/tv/1402/seas...en,en,null:
                                            {"status_code":25,"status_message":"Your request count (41) is over the allowed limit of 40."}
08:39:46.313 T:7512   ERROR: ADDON::CScraper::Run: Unable to parse web site

We can then confirm that is an issue of rate limiting...may be is just due to the processor that is used as @DaveBlake suggested. The better are the performances, the higher the number of requests.

@scudlee, I'm going to try your new tmdb.xml in a while...time for a coffee now!  :blush:
Light IMDb Ratings Update - Keep updated the IMDb ratings for your movies and TV shows.
In case you found useful my work, feel free to offer me a cappuccino!
Reply
(2018-11-22, 00:27)scudlee Wrote: I've done a quick rewrite to use the season lists for the episode details, rather than pull each episode individually:
https://pastebin.com/MVEC6R7q
If you can replace the tmdb.xml in metadata.tvshows.themoviedb.org with contents of that pastebin and give it a whirl, let me know if it helps. 
It should reduce the number of calls per show to the number of seasons plus 1 (plus another 1 for the entire scrape).

(It also contains my Episode Group code, but that shouldn't affect anything, I was just too lazy to revert back before making the changes.)
Hi scudlee,

Unfortunately, even with the new tmdb.xml you provided, the issue remains. I have also a debug log but being more than 512 kilobytes I can not upload it on pastebin. Let me know if you need it to check...

I'm not an expert, but if the code of Kodi is add-on agnostic (i.e. I can use different scrapers for TV Shows), I would expect that modifications to reduce the number of requests should be implemented at Kodi level and not at the add-on one, but may be I'm wrong.

In parallel, I will try to give a look on how to throttle the number of requests as done for Musicbrainz.
Light IMDb Ratings Update - Keep updated the IMDb ratings for your movies and TV shows.
In case you found useful my work, feel free to offer me a cappuccino!
Reply
Post the log as much as you can so we can see whats going on.

I will also test later.

EDIT: Don't forget the addon is in program files and not userdata if you are using the nightlies
Reply
If you could upload the log somewhere, that would be great. 
With the code you shouldn't even be getting 40 hits per show (unless it has 38+ seasons), so it would need to be running a full scan, and adding multiple shows/episodes all within that 10 seconds window to even get hit.
Reply
Works for me Big Grin

Lowest rate limit it gets to with the updated scraper is 
Code:
08:45:59.151 T:2204   DEBUG: Curl::Debug - HEADER_IN: X-RateLimit-Remaining: 28

Thanks @scudlee!

EDIT: to say that I am only testing 2 shows shows with 9 seasons. I wonder if its going to hit the rate limit when I do more than that as its VERY FAST Smile
Reply
Hi guys!

@scudlee please find the log at this link.

@docwra can you try with the test package I shared yesterday as well?
Light IMDb Ratings Update - Keep updated the IMDb ratings for your movies and TV shows.
In case you found useful my work, feel free to offer me a cappuccino!
Reply
(2018-11-22, 10:48)docwra Wrote: EDIT: to say that I am only testing 2 shows shows with 9 seasons. I wonder if its going to hit the rate limit when I do more than that as its VERY FAST Smile 
 That does seem like a Catch-22 situation.  Because all the information is cached at the start, all the episode details will get added a lot faster because there's no delay while downloading the individual episode information.
Which could then mean in a full scan you could still hit the rate limit scanning multiple shows...
Reply
(2018-11-22, 11:06)axlt2002 Wrote: Hi guys!

@scudlee please find the log at this link.

@docwra can you try with the test package I shared yesterday as well?
Ok, here I Am after a more deep analysis. The situation is following:
 
  • TV shows that have parsing nfo (or combination nfo) files still have the issue; I guess this is because having already the url related to the TV show in the various tvshow.nfo/episode.nfo files, the requests are managed in a different way
  • TV shows that don't have nfo files work perfectly now! I have scraped 6 tvshows for a total of 326 episodes and the minimum value of X-RateLimit-Remaining reached around 24/26

Thanks @scudlee! Do you think there is a way to manage the first bullet case as well? The log I posted is related to that situation...would be great to have a solution for this case as well! Let me know if I can help in some way...
Light IMDb Ratings Update - Keep updated the IMDb ratings for your movies and TV shows.
In case you found useful my work, feel free to offer me a cappuccino!
Reply
(2018-11-22, 11:06)axlt2002 Wrote: Hi guys!

@scudlee please find the log at this link.

@docwra can you try with the test package I shared yesterday as well?
 Okay, there is definitely something amiss in your setup.  I see multiple hits for URLs which should have been cached for the duration of the scan.  Which is odd.
I would suggest looking in C:\Users\00918203\Programmi\Kodi\portable_data\cache\scrapers\metadata.tvshows.themoviedb.org and seeing if there are any files in there.
Possibly have that window open while scanning and see if files are being written and rewritten, or if it stays empty.

If it stays empty, there may be some permissions issue, or something, so nothing can be cached.
If they're being cached but immediately be rewritten, it may be something else I don't fully understand, but might be solved by adding a cachepersistence term to the addon.xml.

Edit:
(2018-11-22, 11:40)axlt2002 Wrote:  
  • TV shows that have parsing nfo (or combination nfo) files still have the issue; I guess this is because having already the url related to the TV show in the various tvshow.nfo/episode.nfo files, the requests are manage in a different way

Thanks @scudlee! Do you think there is a way to manage the first bullet case as well? The log I posted is related to that situation...would be great to have a solution for this case as well! Let me know if I can help in some way... 

This is basically the "something else I don't fully understand" part I mentioned.  If you look in the addon.xml for metadata.tvdb.com, you'll see a part that says  cachepersistence="00:15".  Try adding that to the equivalent spot in the addon.xml for metadata.tvshows.themoviedb.org.  (You will need to stop and restart Kodi.)
Reply
(2018-11-22, 11:40)scudlee Wrote:
(2018-11-22, 11:06)axlt2002 Wrote: Hi guys!

@scudlee please find the log at this link.

@docwra can you try with the test package I shared yesterday as well?
 Okay, there is definitely something amiss in your setup.  I see multiple hits for URLs which should have been cached for the duration of the scan.  Which is odd.
I would suggest looking in C:\Users\00918203\Programmi\Kodi\portable_data\cache\scrapers\metadata.tvshows.themoviedb.org and seeing if there are any files in there.
Possibly have that window open while scanning and see if files are being written and rewritten, or if it stays empty.

If it stays empty, there may be some permissions issue, or something, so nothing can be cached.
If they're being cached but immediately be rewritten, it may be something else I don't fully understand, but might be solved by adding a cachepersistence term to the addon.xml. 
We wrote at the same time.  Smile  Please read my post above...I think it explains the situation and hopefully it is going to have a solution as well.
Light IMDb Ratings Update - Keep updated the IMDb ratings for your movies and TV shows.
In case you found useful my work, feel free to offer me a cappuccino!
Reply
Stop writing faster than me! Rofl .  Check my edit above.
Reply
Thanks @scudlee for picking this up and great work on the scraper - as always Smile

I also find it suspicious yesterday to see in his log multiple hits on "http://api.themoviedb.org/3/configuration?" - what should definitely not happen and I was puzzled about.
Reply
  • 1
  • 11
  • 12
  • 13(current)
  • 14
  • 15
  • 42

Logout Mark Read Team Forum Stats Members Help
TheMovieDB - TV Show scraper (XML)2