v18 Scraping artist MBID (PR#12120)
#1
I have a concern about the PR as relates to scraper adding Artist MBID to the music database.

Currently, Musicbrainz has the idea of artist alias, which isn't implemented in Kodi database. As a result, it is possible that user tags include an artist name string with an MBID that conflicts with the primary artist name string found on Musicbrainz. There is an option in Kodi "prefer online information" which will normalize the artist name string to the Musicbrainz primary name. However, if the user does not set this switch, the tag name string is retained in the database (with or without MBID).

So my question is what happens if

1. User has tagged an artist name string and MBID. Subsequently scraper runs and finds album containing the track with a different name string but identical MBID. What happens in the database?

2. User has tagged an artist name string and no MBID. Again scraper runs and finds album containing the track with a different name string and MBID. What happens in the database?

3. As a follow on to 2, assume after scraping user adds new track with the same artist name string as in 2, but now user also has tagged the MBID then updates the database. What happens in the database?

My experience in the past (through 17.1) is that once an incorrect MBID (incorrect from the user's perspective) is tied to an artist string name, it is impossible to fix without removing all tracks containing that artist in any role, cleaning the database, then restoring and scanning the tracks (after correcting track tags).

scott s.
.
Reply
#2
Scott, firstly thanks for looking at this PR and thinking about the design change consequences, it needs exactly that kind of review.

Of course full support for artist alias names is something I would like to add to Kodi (now on my public list), but it is not happening today.

Your concern is around having an incorrect MBID being attached to an artist, how this gets corrected and then stays that way. I get that completely, hence when storing scarped MBIDs it also has to store a flag that differentiates between MBIDs derived from tags and those added by the scraper. Refresh of info does not use the scraped MBIDs but does lookup on name.

But your questions also relate to using the "Prefer online information" option, and own choices of names in tags along side MBIDs. The key issue here is that the behaviour varies depending on if the artist is already in the library, or is being added for the first time. The variety of behaviour does worry me sometimes, add to it hidden 503 server timeout errors that means scraping even well known music fails, and the results from a user view point can seem non-deterministic, or at very least confusing. Let's dig in a bit.

(2017-05-23, 02:27)scott967 Wrote: 1. User has tagged an artist name string and MBID. Subsequently scraper runs and finds album containing the track with a different name string but identical MBID. What happens in the database?
"Prefer online information" is disabled:
Auto scraping on lib update ("fetch additional info on update" enabled too) will scrape that artist using the MBID if it has not been scraped before, adding info, but not change the name. The artist display strings (not always the same as the union of the artist names) for the album and songs also will not change. Album scraping will not impact the artist info at all (already have artist MBID from tags).

"Prefer online information" is enabled:
The auto scraping when music is first added to the library will replace any artist name in the artist table derived from tags with that name it scraped. I think that is a correct implementation of what "prefer online information" means - use what MB gives, not what is in the tags. But the artist display string (in the album and song tables) is not updated, and it probably should be.

However album scraping (from Info dialog for "Query Info for all") still does not impact the the artist info, neither does artist scraping (from Info dialog for "Query Info for all") - Kodi always uses the first artist name it is given for that MBID. When adding new music, if the artist is already in the library, and has been scraped then the artist name remains unchanged.

This means that for artists names "Prefer online information" only applies to when the item is added, even though for other things (album genres, year, compilation and label) this option applies to all scraping. This should probably be made consistent.

Quote:2. User has tagged an artist name string and no MBID. Again scraper runs and finds album containing the track with a different name string and MBID. What happens in the database?
"Prefer online information" is disabled:
Auto scraping will scrape the album artists using name - we may have artist MBIDs back from scraping the album, but as these do not match the artist names we have they are not stored, so all we have is artist name. This form of lookup is the least accurate, artist name is far from unique and the scraper just returns the first found.

"Prefer online information" is enabled:
The auto scraping add the scraped artist name and MBID pair as a new artist (if that MBID does not already exist in the artist table), setting it as album artist. But the artist display string (in the album and song tables) is not updated, and again it probably should be. Worse, the artist name from tags will remain as a artist table record and the songs will be credited to that artist (in the song_artist table).

In short you end up with a mix of both the artist from tags and that scraped. Sad
The scraped album artist info be probably applied to the songs too, but do we use the other song info too? Moving into whole new area for what "Prefer online information" does.

Quote:3. As a follow on to 2, assume after scraping user adds new track with the same artist name string as in 2, but now user also has tagged the MBID then updates the database. What happens in the database?
"Prefer online information" is disabled:
We have that artist as a name without MBID, tag scanning will match by name and apply the MBID to it. If not scraped already, auto scraping will scrape the album artists using MBID. If info for the wrong artist was inaccurately scraped before then that remains incorrect.

"Prefer online information" is enabled:
Say we have (artist_A, null), (artist_B, MBID_B) from 2, and now we scan tags with (artist_A, MBID_B). MBID takes presedence, and we have that MBID scraping in 2, so the tag artist name artist_A does not get applied during scanning. Hence the artist records remain unchanged.

Quote:My experience in the past (through 17.1) is that once an incorrect MBID (incorrect from the user's perspective) is tied to an artist string name, it is impossible to fix without removing all tracks containing that artist in any role, cleaning the database, then restoring and scanning the tracks (after correcting track tags).
Issues caused by inaccurate scraping capturing the wrong MBID and info can be resolved by a manual refresh from info dialog. But things changed by scraping with "Prefer online information" enabled can not. The only way back to just tags is the remove-clean-rescan rigmarole.

How "Prefer online information" was applied to album artists was already flakey, and PR12120 has not helped matters. I sometimes wonder exactly what Kodi is trying to achieve with the "Prefer online information" option. Without MBIDs I think this is the "I have tagged my music badly, but hope the scraper can fix it" option - it might, or it might make a bigger mess. With MBIDs maybe this could be a "update my lib with the latest online info" option, but since we don't re-scrape anything previously scraped (only those with a null lastscraped date) that can not be the case. To me if you have MBID tags then you will also have the other tag data correct, and the derrived data does not need to be overwritten by scraper. The implementation is historic and slightly inconsistent, I can only guess.

It would help to be clearer on what we actually want scraping an album with "Prefer online information" enabled to do.
Reply
#3
Have just chatted to @night199uk on IRC, one of the guys that implemented the original Musicbrainz integertaion in Kodi. His view was that the "Prefer online information" setting was really to allow Kodi to use the MBIDs from tags to accurately fetch all the other information, and regularly update it. Interesting to know.

Of course Musicbrainz entries change, artist credits get corrected along with other facts about the release, so I guess "Prefer online information" for music with MBID tags should mean album and song artist credits get replaced with the lastest data, that includes the artist name.

But trying to lookup on names alone, even album and artist combined, get MBIDs, and then modify the library changing the album to artist relationships and identities, is fraught. I question if it makes any sense to try and sort out a mix of stuff some with MBID and some without.

Maybe the solution is to say that changing artist credits (because "Prefer online information" is enabled, or controlled by some new setting), only happens for music with MBIDs drived from tags. If we have done a lookup using just name, and thus scraped the album and artist MBIDs then we can fetch other album and artist details e.g. dates, review, styles etc. and overwrite everything but the artist credits.

Proposal:
Replace "Prefer online information" setting with two new settings:
a) "Prefer album information from tags" - means take album genres, year, compilation and label from tags (not scraped from NFO or online sources). This is the only information other than albumartist credits we currently "overwrite". In the future artist genre could be added to this list, so maybe a better name to be found? But want to make it really clear info/tags are involved.

b) "Update artist credits for MBID tagged music" - indicates that artist credits for both album and songs, and the artist names, will be updated based on the MBID tags, ignoring what other tag values my say.

To go with b) we also need a facility to automatically re-scrape items with MBIDs from tags.

Currently only items with lastscraped = null get scraped automatically, so once something is scraped it can only be re-scraped one at a time from the refresh button on the info dialog. Of course it can also be automatically updated if the music file itself is altered in some way.
Reply

Logout Mark Read Team Forum Stats Members Help
Scraping artist MBID (PR#12120)0