v18 Scraping artist MBID (PR#12120)
#2
Scott, firstly thanks for looking at this PR and thinking about the design change consequences, it needs exactly that kind of review.

Of course full support for artist alias names is something I would like to add to Kodi (now on my public list), but it is not happening today.

Your concern is around having an incorrect MBID being attached to an artist, how this gets corrected and then stays that way. I get that completely, hence when storing scarped MBIDs it also has to store a flag that differentiates between MBIDs derived from tags and those added by the scraper. Refresh of info does not use the scraped MBIDs but does lookup on name.

But your questions also relate to using the "Prefer online information" option, and own choices of names in tags along side MBIDs. The key issue here is that the behaviour varies depending on if the artist is already in the library, or is being added for the first time. The variety of behaviour does worry me sometimes, add to it hidden 503 server timeout errors that means scraping even well known music fails, and the results from a user view point can seem non-deterministic, or at very least confusing. Let's dig in a bit.

(2017-05-23, 02:27)scott967 Wrote: 1. User has tagged an artist name string and MBID. Subsequently scraper runs and finds album containing the track with a different name string but identical MBID. What happens in the database?
"Prefer online information" is disabled:
Auto scraping on lib update ("fetch additional info on update" enabled too) will scrape that artist using the MBID if it has not been scraped before, adding info, but not change the name. The artist display strings (not always the same as the union of the artist names) for the album and songs also will not change. Album scraping will not impact the artist info at all (already have artist MBID from tags).

"Prefer online information" is enabled:
The auto scraping when music is first added to the library will replace any artist name in the artist table derived from tags with that name it scraped. I think that is a correct implementation of what "prefer online information" means - use what MB gives, not what is in the tags. But the artist display string (in the album and song tables) is not updated, and it probably should be.

However album scraping (from Info dialog for "Query Info for all") still does not impact the the artist info, neither does artist scraping (from Info dialog for "Query Info for all") - Kodi always uses the first artist name it is given for that MBID. When adding new music, if the artist is already in the library, and has been scraped then the artist name remains unchanged.

This means that for artists names "Prefer online information" only applies to when the item is added, even though for other things (album genres, year, compilation and label) this option applies to all scraping. This should probably be made consistent.

Quote:2. User has tagged an artist name string and no MBID. Again scraper runs and finds album containing the track with a different name string and MBID. What happens in the database?
"Prefer online information" is disabled:
Auto scraping will scrape the album artists using name - we may have artist MBIDs back from scraping the album, but as these do not match the artist names we have they are not stored, so all we have is artist name. This form of lookup is the least accurate, artist name is far from unique and the scraper just returns the first found.

"Prefer online information" is enabled:
The auto scraping add the scraped artist name and MBID pair as a new artist (if that MBID does not already exist in the artist table), setting it as album artist. But the artist display string (in the album and song tables) is not updated, and again it probably should be. Worse, the artist name from tags will remain as a artist table record and the songs will be credited to that artist (in the song_artist table).

In short you end up with a mix of both the artist from tags and that scraped. Sad
The scraped album artist info be probably applied to the songs too, but do we use the other song info too? Moving into whole new area for what "Prefer online information" does.

Quote:3. As a follow on to 2, assume after scraping user adds new track with the same artist name string as in 2, but now user also has tagged the MBID then updates the database. What happens in the database?
"Prefer online information" is disabled:
We have that artist as a name without MBID, tag scanning will match by name and apply the MBID to it. If not scraped already, auto scraping will scrape the album artists using MBID. If info for the wrong artist was inaccurately scraped before then that remains incorrect.

"Prefer online information" is enabled:
Say we have (artist_A, null), (artist_B, MBID_B) from 2, and now we scan tags with (artist_A, MBID_B). MBID takes presedence, and we have that MBID scraping in 2, so the tag artist name artist_A does not get applied during scanning. Hence the artist records remain unchanged.

Quote:My experience in the past (through 17.1) is that once an incorrect MBID (incorrect from the user's perspective) is tied to an artist string name, it is impossible to fix without removing all tracks containing that artist in any role, cleaning the database, then restoring and scanning the tracks (after correcting track tags).
Issues caused by inaccurate scraping capturing the wrong MBID and info can be resolved by a manual refresh from info dialog. But things changed by scraping with "Prefer online information" enabled can not. The only way back to just tags is the remove-clean-rescan rigmarole.

How "Prefer online information" was applied to album artists was already flakey, and PR12120 has not helped matters. I sometimes wonder exactly what Kodi is trying to achieve with the "Prefer online information" option. Without MBIDs I think this is the "I have tagged my music badly, but hope the scraper can fix it" option - it might, or it might make a bigger mess. With MBIDs maybe this could be a "update my lib with the latest online info" option, but since we don't re-scrape anything previously scraped (only those with a null lastscraped date) that can not be the case. To me if you have MBID tags then you will also have the other tag data correct, and the derrived data does not need to be overwritten by scraper. The implementation is historic and slightly inconsistent, I can only guess.

It would help to be clearer on what we actually want scraping an album with "Prefer online information" enabled to do.
Reply


Messages In This Thread
Scraping artist MBID (PR#12120) - by scott967 - 2017-05-23, 02:27
RE: Scraping artist MBID (PR#12120) - by DaveBlake - 2017-05-23, 14:44
Logout Mark Read Team Forum Stats Members Help
Scraping artist MBID (PR#12120)0