TESTERS (no longer) WANTED: New TVDB scraper
#1
All features are now part of the official 2.0.x/3.0.x TVDB scraper!
Please change back and update your libraries
Thank you to everyone who tested!
What this is
A while ago I wrote an experimental new version of the TVDB scraper (using the current API), but I never got to do any extensive testing of it, and so never publicly released it. I feel its time to change that.

This is pretty much a complete rewrite of the current TVDB scraper mostly using XSLT scraping, which was added way back for v13.0 (Gotham).
XSLT adds the potential for a lot of the more program-like functions that the regex-based scrapers have been sorely lacking - things like complex conditional structures, looping, even just basic arithmetic.
And, of course, the ability to manipulate XML-based APIs directly. (...So naturally, pretty much everyone is now using JSON for their APIs...)

What I'm looking for in this thread is for some brave souls to basically rescan their TV libraries using this scraper and report back any oddities, or successes, in order that I can fix any unforeseen bugs, and also gain some evidence and confidence that this is a viable replacement for the current scraper (so even a "didn't notice any difference" is helpful).
Naturally, I've tested it against my own library, and made some dummy files and folders to spot-check various features, but my set-up is pretty vanilla. I like to keep my files tidy, so it's possible there are cases "out in the wild" that I haven't considered.
There may also be instances where you'll want to rename or renumber files to take advantage of some of the new features (and test they work properly, too).

Also of particular interest would be its performance on different platforms - I've only really tested it on Windows, and briefly on an Android tablet. Is it noticeably faster or slower?

"But, scudlee," I hear the better-informed of you asking, "isn't there a new TVDB API already in beta? Why put out this scraper now?"
Yes, there is! And I've already written a scraper for it (...mostly), and that scraper has all the same features as this one. So I don't see why you shouldn't get the benefits now, as the new API scraper definitely shouldn't be released until at least after the API is out of beta. (This will also help me iron out any wrinkles for that version too.)

What this isn't
A feature request thread.

Download link

metadata.tvdb.com.xslt - HOW-TO:Install add-ons from zip files (wiki)


Note, requires Kodi v13.0+ (Gotham or higher). It will install on earlier versions, but it won't do a whole lot.

For testing purposes, this will install as a separate scraper from the original, in case you need/want to switch back. The new scraper will be named "The TVDB (xslt)".

If you want, you can change content on your TV source directly to this scraper, and refresh shows individually and/or add episodes to test, but I'm also hoping some of you will set the content to "None" first to clear the library and then rescan the entire thing using this version, which should make it clearer where any issues or changes in behaviour might lie.

New features
The first thing to note is that there's nothing particularly wrong with the current version of the scraper, provided you stick to the defaults - Aired order, Folders named exactly according TVDB, etc.
Once you start veering away from the defaults is when you may run into issues. Using DVD or Absolute order, or naming folders not quite as the show appears on TVDB, can all lead to unexpected outcomes.
This new version tries to address some of those issues, while hopefully still behaving the same in the default configuration.

Search improvements
The search has been improved in a couple of ways.

Firstly the year (if present) is no longer attached to the title before running the search. Instead the year is scraped from the search results and used to aid in matching. This is similar to how movie scrapers handle it (sometimes the year is included as part of the search query, but that's slightly different).
If you include the year as part of your folder name when it isn't needed, the current scraper will return no results. This was apparently so bad that extra code was added to Kodi core to automatically re-run the scraper with the year ignored if there were no results - which hid the issue in most cases.
However, take the case of a US remake of another country's show, for example Being Human and Being Human (US) - If you had instead named the folder for the remake "Being Human (2011)", on the first pass the scraper would search for "Being Human (2011)" and find no matches. On the second pass it would search for "Being Human", return both versions of the show, and crucially, match the earlier version over the remake - the year would not factor in at all.
Image
Comparison of search results between the current scraper (top), the new version (middle), and the new API version (bottom).
All searches performed on the folder "Being Human (2011)".

Secondly, alternative/alias titles are now also returned as separate search results. The TVDB API has for quite a while now returned search results that match against alias titles, however the scraper currently only ever returns the "official" titles from the results.
This can lead to a situation where an exact match to an alias for one show is ignored for a partial match to a different show.
For example, "CSI" is an alias for the original CSI: Crime Scene Analysis, but Kodi will favor CSI: NY from the search results because it's the closest match from the official titles (since it has the fewest extra characters).
The alias titles are returned after the official titles, so an exact match to an official title will still be preferred over an exact match to an alias.
Image
Comparison of search results between the current scraper (top), the new version (middle), and the new API version (bottom).
All searches performed on the folder "CSI".

(Note also the massively improved search in the new API, and increased number of aliases. Handy, no doubt, if you use "CSI" as shorthand for Variety Show of Mr. Con & Ms. Csi (Result #14).)

Of course, the natural response to both those cases is, "You named your folder wrong", but that doesn't mean it needn't work.

Trakt.tv ratings
Should be fairly self-explanatory.
Image

Absolute order
The Absolute Episode order setting is currently only usable if the absolute order is defined on TVDB, if it isn't, then the episodeguide the scraper returns will contain no episode or season numbers at all, and no episodes are ever found. There's no fallback (unlike DVD order).
In the new version, if there isn't a defined absolute order, the scraper will generate one itself, based on the aired order.
The formula used is:
(aired) Season X, Episode Y -> (absolute) Season 1, Episode [Y+(#(episodes sAeB: X > A > 0))] (Note: This won't be accurate if there are episodes missing on TVDB.)
An earlier (less public) version of this feature performed counts for every episode without a defined absolute order in the episodeguide, and would be very slow for very large series, and even crashed Kodi on my tablet when testing The Daily Show (3000+ episodes). The current method performs the counts for each season only once to create a lookup table, and is significantly faster. Sadly I got rid of my old tablet that crashed, so I can't do a fair comparison, but the new tablet chugged through The Daily Show like a trouper. Unlike the earlier version, though, that one set of counts always occurs (when Absolute order is chosen), even if it is already defined for every episode, so the performance of this on other devices is definitely something that needs more testing.
Image
Turns out the 200th episode of Stargate SG-1 is "200". Phew!
Note the lack of episode numbers in the Absolute Order listing.
(Files are named S01E001E002.disc, S01E003.disc, ..., S01E214.disc.)

Another issue fixed, which I don't think I've ever seen brought up, but definitely happens, is that the placement of special episodes currently only ever uses the aired order, even if absolute order is chosen. This means that specials that should appear within the second, or higher, (aired) season will always appear after the single (absolute) season.
The new version maps the placement back to the absolute position (using the generated value if necessary).
Image
The first two One Piece specials appear after every other episode using the current scraper.
(S25 is really (absolute) episode 590 - This is an actual bug.)
 
Image
In the new scraper, the specials appear before the correct "Season 2" episodes.
(Episode 590 shows up correctly too.)

DVD order
The most practical new feature is in the handling of DVD order, and is really the main reason I'm making the scraper public now.
One of the biggest issues many users have with DVD order is in the handling of double episodes - in DVD order TVDB has a habit of renumbering the two halves to be two sub-parts of a single episode, e.g. S01E01.1 and S01E01.2.
If you had the sub-parts in a single file it used to be the case that you could number that file SE01E01.1E01.2 (or equivalent) and both parts would be found. This currently doesn't work due to a bug, but even then this had issues - both parts would display the same episode number and would be sorted alphabetically, not by sub-part.

The new version offers an alternative - by creating a single merged episode from all the sub-parts.
So instead of SE01E01.1E01.2, you number the file simply S01E01, and the details of the sub-parts are merged together.

It's probably easiest to just see this in action.

Here is the first season of Stargate SG-1 using Aired Order:
Image
(Files are named S01E01E02.disc, S01E03.disc, ..., S01E22.disc.)

And here it is again, using DVD Order:
Image
(Files are named S01E01.disc, S01E02.disc, ..., S01E21.disc.)

As you can see, the two-part opener is now listed as a single episode - the titles have been merged and shrunk (because they're identical apart from the bracketed number), the plots have been concatenated, even the ratings have been re-averaged (an IMDb/Trakt rating will only be the rating for the first part, though). All the other, unseen, details are merged as well.

The first season of Adventure Time using Aired Order:
Image
(Files are named S01E01E02.disc, S01E03E04.disc, ..., S01E25E26.disc.)

And again, using DVD Order:
Image
(Files are named S01E01.disc, S01E02.disc, ..., S01E13.disc.)

This shows what it looks like when the titles of the sub-parts are different.

The sub-parts are still available to use individually, if you have the episode split into separate files, but as noted above, the bug prevents them from working in a single file. The only way that will work currently is as a merged episode.

Here is Adventure Time, again with DVD Order, but this time with the parts separated:
Image
(Files are named S01E01.1.disc, S01E01.2.disc, ..., S01E13.2.disc.)

Notice how some of the pairs of episodes are swapped, due to being sorted alphabetically.

Although the examples here only have two sub-parts, any number of parts will be merged.

Other minor DVD order fixes include:
  • The current scraper doesn't actually use the DVD season and DVD episode numbers directly, instead it uses the "Combined" values, which are defined as being the DVD numbers if they exist, otherwise the Aired numbers. In rare cases, though, only the DVD season might be defined (typically for a special episode) - leaving the combined values as the DVD season/Aired episode, potentially duplicating (and blocking) an actual DVD Season/DVD episode pairing. The new scraper uses the DVD numbers directly, and then only if both are defined.

  • In even rarer cases, an aired episode may not have any DVD numbers set at all, but its aired numbers still coincide with the DVD numbers of another episode (this used to be the case for Fringe S02E11 which was really a bonus season 1 episode and was left off the DVD release - it's since been given the DVD number S01E21). To stop this blocking the actual DVD episode, but still let it be scanned, the aired episode will be remapped as the first available sub-part of that episode, so in the case of Fringe, it would have been accessible as S02E11.1 (if the DVD episode was a double episode it would be .3, etc. - it won't show up as part of the merged episode).

  • As with Absolute order, the placement of special episodes is remapped to being before the correct DVD order episodes.

To aid in debugging, I've also added an XML comment to the episodeguide clearly stating which ordering is being used (Aired/DVD/Absolute), as this is often a point of contention. You'll also notice in the debug log that XSLT leaves the XML output reasonably well-formatted, which although making it more readable, does massively increase the number of lines in the debug log.

...And that's the scraper.

Please test!
Reply
#2
Thanks alot for this! Perfect timing for the v16 release, I'll rescan all I have with this now, I'll report as soon as I'm done.
Reply
#3
Problem (well not really problem i guess):
Scanning my "Doctor Who (2005) folders shows "Doctor Who (2005) (2005)" in the progress dialog. So The year is appended to the already existing name.

Haven't done any times and just kept default settings like i usually do. Did try English and Dutch and so far that worked.
Read/follow the forum rules.
For troubleshooting and bug reporting, read this first
Interested in seeing some YouTube videos about Kodi? Go here and subscribe
Reply
#4
Yeah, that's just going to happen because the first "(2005)" is part of the actual title and the second is the actual year scraped. It doesn't affect the details returned.

Not sure if I could crop the year from the returned title during the search, actually, since that would need a regex, and that doesn't work in XSLT 1.0. Might be possible after the XSLT as some kind of regex tidy-up, maybe.

Edit: Nope. I can do it, it's just I realize now that because that's what gets matched against, if I take the year out, then when you don't supply the year you're at the whim of the order of returned results.

e.g. If you searched for "Doctor Who" (no year attached), it will currently only match the title of the original series exactly, and so prioritize that one. If I take out the years from the titles, it will match exactly the title of both the original and the 2005 one. Which it picks would depend entirely on the order of the results.
Reply
#5
Looking very promising, just rescanned my collection with the newest Krypton nightly.
Everything looks normal from a quick check. Season and tv show count is the same as before. My setup is also pretty basic, so no fancy orders etc.

The only difference I can find, is that an episode thumb for a very new episode hasn't been picked up from the new scraper, where the old on has one.

Thank you and good work.
Reply
#6
Odd about the thumbnail. Do you have a debug log? The thumbnails URLs are definitely scraped, and in both scrapers are taken directly from the API, so there shouldn't be a difference.
(In the new API, I have to generate the URL from the series and episode ids because the filename isn't supplied.)
Reply
#7
(2016-02-22, 20:33)scudlee Wrote: Odd about the thumbnail. Do you have a debug log? The thumbnails URLs are definitely scraped, and in both scrapers are taken directly from the API, so there shouldn't be a difference.
(In the new API, I have to generate the URL from the series and episode ids because the filename isn't supplied.)

You should be able to reproduce it if you rename a textfile to "The Venture Bros. - 6x04 - Rapacity In Blue.mkv"

Please see http://xbmclogs.com/ppeucycy1 for a logfile. First scanned with the new scraper, then removed. Then scanned with the old one.
Reply
#8
Yep that's me being dumb. The thumbnail you're seeing from the current scraper is actually the generated one, but the new scraper is always putting a URL there, even if there is no filename.

Fixed version:
metadata.tvdb.com.xslt
Reply
#9
Old scraper couldn't find proper result for tv show Second Chance. I tried this one and it didn't match it too. I renamed the folder to "Second.Chance (2016)" but that didn't help either.
Reply
#10
"Second Chance (2016)" works fine for me. Would need to see a debug log to help further.
Reply
#11
I got a little problem with a show called fiskedrag when set to swedish (sv) http://thetvdb.com/?tab=series&id=92561&lid=8
I have no nfo for that show btw.



21:37:01 T:1256 NOTICE: VideoInfoScanner: Finished scan. Scanning for video info took 00:11
21:37:30 T:11012 ERROR: VideoInfoScanner: Asked to lookup episode davs://USERNAMETongue[email protected]:443/dav/pub/tv-serier/Fiskedrag/Season%2001/fiskedrag.s01e01.xvid.avi online, but we have no episode guide. Check your tvshow.nfo and make sure the <episodeguide> tag is in place.
21:37:30 T:11012 ERROR: VideoInfoScanner: Asked to lookup episode davs://USERNAMETongue[email protected]:443/dav/pub/tv-serier/Fiskedrag/Season%2001/fiskedrag.s01e02.xvid.avi online, but we have no episode guide. Check your tvshow.nfo and make sure the <episodeguide> tag is in place.


Also Seinfeld worked great with dvdorder (thanks!!!!)
Reply
#12
Need the full debug log. It's working fine here.

Image
Reply
#13
(2016-02-22, 22:28)scudlee Wrote: "Second Chance (2016)" works fine for me. Would need to see a debug log to help further.

Sorry, my fault. I forgot there was tvshow.nfo file. Deleted it and forced rescan via change content menu and now it scraped. Thanks!
Reply
#14
ok I started kodi to generate a new Debug Log and then it worked, I probably did something stupid the last time then.
Reply
#15
Works nicely so far with my collection. Right now switching various folders over to DVD Order where appropiate. Will report any odd finding back. So far no issue!
Reply

Logout Mark Read Team Forum Stats Members Help
TESTERS (no longer) WANTED: New TVDB scraper1