The logic and future of Music scrapers?

  Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
DaveBlake Offline
Team-Kodi Member
Posts: 2,142
Joined: Jun 2015
Reputation: 56
Location: South West England
Post: #46
I think that PR12120 makes the core changes needed to address points #1 & #2 of the first post.

A reminder:

(2017-02-07 17:45)ronie Wrote:  1) if the 'prefer online info' setting is disabled, we pass the artistname to the artist scraper.
if the setting is enabled, we pass the artist mbid to the scraper.

why don't we always pass the mbid (if available) regardless of this setting?

2) if the album scraper returns no results, we completely skip the artist scraper. why?

3) if the 'prefer online info' setting is enabled, and 'show song and album artists' is enabled:
this causes the same artist being listed twice in your library if the artistname in your tags does not 100% match the artistname the scraper returns.

for instance "The B-52's" vs. "The B-52s":
3.1) i have all songs of an album tagged with artist "The B-52's"
3.2) we start the album scanner and it returns the mbid for this artist
3.3) we pass this mbid to the artist scraper and it returns info for "The B-52s" and kodi adds it to the db.
3.4) kodi now scans all songs for 'additional' artists. it finds "The B-52's" and checks if it's already in the db... nope
3.5) we pass "The B-52's" to the artist scraper and it returns info for whatever closest match it can find and kodi adds this artist to the db

ref: https://github.com/xbmc/xbmc/blob/99c25f...#L843-L883

#1. With the PR scraped album and artist mbids are stored. Hence having scraped an album, the mbids for the artists returned with the results are added to the db (if we didn't have them) and available for scraping the album artists, or for other addons to use to fetch art etc.

Of course the artists returned by the scraper can differ from those in the library. This could happen with the user has tagged with mbids but then edited (they do that), but even with album lookup by album and artist name this can happen too, especially with classical music [My example is Dvorak Symphony No. 7]. In that case when "perfer online nfo" is disabled, we can still store any returned mbids where the names match.

#2. I think the "why" was to avoid repeatedly trying to get artist data that Musicbrainz didn't have. For example say I have 3 albums by artist1 none of them in MB database, on scanning it would check the albums and go no further. Otherwise it would have requested artist for evey album and every song. Anyway with the PR it does scrape artists even if the album fails, but uses a list to ensure each attempt is made only once per run.

#3 I'm not so sure I have resolved this and need to do more testing. But I'm not sure I repeat it either. Assuming that the music has no mbid in tags, I just get The B52's replaced with whaever the scraper finds e.g. "The 'B' Girls". It mis-identifies wildly, not spotting "The B52's" is "The B52s"
(This post was last modified: 2017-05-18 17:46 by DaveBlake.)
find quote
DaveBlake Offline
Team-Kodi Member
Posts: 2,142
Joined: Jun 2015
Reputation: 56
Location: South West England
Post: #47
Having a tedious time testing as getting lots of 503 server timeout errors.

This has made me wonder if, with refresh (from the info dialog) at least, on a failed lookup by name we should separate the returns between
a) Musicbrainz server accessed but item not found
and
b) The scraping process failed for technical reasons - 503 errors mostly

At the moment both scenarious return INFO_NOT_FOUND and let the user re-enter the names. While this is worthwhile for a), it is misleading for b).

In the b) case it would be better to tell the user that the timeout happened and try again later. I went of to check the log to see what happened, but the normal user isn't going to do that. They just get pissed that Kodi can't find images and info etc.

It also makes me wonder about #2 in previous post. Say no mbids and you get a 503 on album lookup, then the subsequent artist lookup will be name only. Even if successfull it is less accurate than the album lookup, and more likely to get the wrong artist. Perhaps another reason the original approach did not attempt artist if the album failed?

Again a split return makes more sense to me. If album 503s then don't attempt artist, if album is just unknown to Musicbarinz, then do lookup artist on just name, it is the best we can do.

Thoughts anyone?
find quote
ironic_monkey Offline
Posting Freak
Posts: 1,383
Joined: Nov 2013
Reputation: 65
Post: #48
i don't think album being found was used as a sanity check. there is a scoring that uses a weighted fuzzy match on artist, album and year, and a threshold for this (0.9 or something, do not recall) for an album to be auto-chosen. artist is more wonky as all you had was the name. always was.

it makes perfect sense to distinguish error codes, just wasn't a thing back when the code was written (it was true or false).
find quote
DaveBlake Offline
Team-Kodi Member
Posts: 2,142
Joined: Jun 2015
Reputation: 56
Location: South West England
Post: #49
Thanks for input Spiff.

So it may not have been a sanity check but maybe should be one now? If we save artist mbids returned by a successfull album lookup and can use them to subsequently lookup artist, then perhaps it is better for the auto scraping (fetch on library update) to skip the artist until we get the album and it can do a better job? I feel that the wrong artist info is far worse than none yet, for mssing stuff go try again (manualy) later. But that is just me maybe, OCD about controlling my lib contents?

Or even always retry immediately on 503 error? Picard hits the same server overload issue as Kodi, their devs are thinking about doing this see https://tickets.metabrainz.org/browse/PICARD-807

I like the idea of distingishing error codes, but not so sure about implementation. How generic does it need to be? CCurlFile::FillBuffer raises the "HTTP returned error 503" error, but getting it flagged back to the scraper level is not so obvious (scared to mess with CCurlFile).
find quote
ironic_monkey Offline
Posting Freak
Posts: 1,383
Joined: Nov 2013
Reputation: 65
Post: #50
i agree, with mbid it's better to delay so we can have more info. and i totally share your ocd, before the days of spotify i had nfo files for everything Wink

it will be slightly tricky to propagate the error code across the interfaces. you will not have access to the file instance you need at scanner level since it will be buried in several layers. i think the least invasive approach is hooking up some callback interface in CCurlFile and then flagging the enabling of this through e.g. a protocol option. only naughty bit with this design is that the callback class would likely have to sit in some global instance.
find quote
Post Reply