Scraper performance in Kodi v20
#1
I just finished testing scrapers for Kodi v20, make of the results what you will...

TVShows

TVShow scraping speed test (20 tvshows, 1,273 episodes)
=======================================================

The Movie DataBase (XML) - 02mins 33secs (153 secs total)
XEM (XML)                - 04mins 32secs (272 secs total)
TMDB TVShows (Python)    - 08mins 25secs (505 secs total)
TVMaze (Python)          - 10mins 39secs (639 secs total)
The TVDB v4 (Python)     - 19mins 14secs (1,154 secs total)
The TVDB New (Python)    - 29mins 24secs (1,764 secs total)

* Tested using latest Kodi v20 nightly, default scraper options.

Movies

Test setup - Latest Nightly. 997 movies. Local artwork pre downloaded, default settings
===============================================================
TMDB Movie (Python)  - 12min 01sec
TMDB Movie (XML)     - 12min 43sec
Universal Movie Scraper - 1hr 21min 44sec

Both using sample library here on the wiki.
Reply
#2
I ran the test also, using the same test files. A few differences...

TV Show Scraping
=====================
TheMovieDB XML - 1273 episodes; 02:42
TMDB TV Shows - 1273 episodes; 15:38
TVDB v4 - 1273 episodes; 19:40
TV Maze - 1269 episodes; 04:55

Movie Scraping
=====================
Universal Movie Scraper - 0 movies; 92:41
TVDB v4 (Python) - 949 movies; 32:37
TheMovieDB XML - 997 movies; 36:14
TheMovieDatabase Python - 997 movies; 47:51
* UMS is broken
** TVDB seems to be the least accurate as it missed 50 movies
My Signature
Links to : Official:Forum rules (wiki) | Official:Forum rules/Banned add-ons (wiki) | Debug Log (wiki)
Links to : HOW-TO:Create Music Library (wiki) | HOW-TO:Create_Video_Library (wiki) || Artwork (wiki) | Basic controls (wiki) | Import-export library (wiki) | Movie sets (wiki) | Movie universe (wiki) | NFO files (wiki) | Quick start guide (wiki)
Reply
#3
Are python scrapes slower due to the startup time of the python interpreter? Is there a way Kodi could pass a bulk lot of file paths to the scraper and it processes all and returns a list of results? I suspect that would require big changes to the Kodi system. But imagine if some apis could search for 10 movies at once etc.
Reply
#4
(2022-10-02, 07:24)matthuisman Wrote: Are python scrapes slower due to the startup time of the python interpreter? Is there a way Kodi could pass a bulk lot of file paths to the scraper and it processes all and returns a list of results? I suspect that would require big changes to the Kodi system. But imagine if some apis could search for 10 movies at once etc.

I believe its a combination of Python being slower (or doing more things) and the actual API's and how many times they need to be contacted.

@pkscout knows more about the inner workings I think.

I'd be interested in breaking it down somehow, to see where the extra processing time comes from.

Clearly taking 4 x as long to scrape TVshows in v20 is a bit of an issue.
Reply
#5
I don't actually know as much about the actual work done in Core, but my impression is that, yes, Python introduces additional processing time.  There is also an issue of the ways in which Python is currently able to interact with Core, and some of that can't be fixed until we actually fully remove support for the XML scrapers (specifically I believe some work around batch saving - right now we have to save every episode one at a time).  That's obviously a touch decision, as the XML scrapers sort of work for most people and are really fast.  I'm personally hoping we can remove the for v21 and work during the next development cycle to squeeze everything we can from the Python scrapers.

Comparing apples to apples, the TV show XML scraper for The Movie Database is clearly faster than the Python version.  Some of that is likely additional API calls to support additional information in the Python scraper, but the rest is the difference between Python and XML.  The differences between the various Python scrapers probably has more to do with the backend APIs than anything.  I used the TVMaze scraper code as a starting point for the TMDb TV Shows scraper, so when you see time differences between those, it's almost all API related.
Reply
#6
I just tried TheMovieDB scraper on my TVshows on the local database, and things were scraped quite a bit faster indeed.
However, at least some TV Shows were not fully scraped although all relevant info is present on the TheMovieDB website.

Example: "She-Hulk Attorney At Law" has 9 episodes on the website, 8 episode files are present/have been aired, and only the first 5 were scraped probably because there where Kodi exported nfo files available from TheTVDB.

This log part was shown after scraping the 5th episode:
Code:
2022-10-10 11:13:47.050 T:127611 ERROR <general>: CCurlFile::FillBuffer - Failed: Server returned nothing (no headers, no data)(52)
2022-10-10 11:13:47.050 T:127611 ERROR <general>: CCurlFile::Open failed with code 0 for 92783:

2022-10-10 11:13:47.050 T:127611 ERROR <general>: Run: Unable to parse web site
2022-10-10 11:13:47.054 T:127611 WARNING <general>: No information found for item '/mnt/clrn/wd40/TVSHOWS/She-Hulk Attorney at Law/', it won't be added to the library.
Got a Kodi problem? Provide a full Debug log (wiki) | Usefull pages: First time user (wiki) | Troubleshooting (wiki) | Free content (wiki) | Forum rules (wiki) | VPN policy (wiki)
Reply



Logout Mark Read Team Forum Stats Members Help
Scraper performance in Kodi v200
This forum uses Lukasz Tkacz MyBB addons.