Artist Scraper
Data Fetching Strategy
The overall approach taken with artists scraper is:
If don't have mbid
try to find artist by name at Musicbrainz to get ID
failing that fallback to discogs (getting discog id?)
Get all details of artist from all possible sources - Musicbrainz, Discogs, TADB, fanart.tv and allmusic.
Merge the results, least accurate first.
Finally apply user preferences for the source of each data value.
@
ronie a few questions on that strategy
- What is used to lookup artist details on the TADB, fanart.tv and allmusic sites - MBID, name, some other info from Musicbrainz, Discogs id (of some other info from Discogs)?
- How do we ensure the results from all the different sites are the same artist, name alone is not enough?
- How do the preferred source sites for each get applied after merging the results, I just don't see it.
Optional Sites
We also discussed making it optional to not only prefer certain source sites, but also pick the sites accessed. Have look everywhere as default, but let those with more specialist collections, or maybe those that just want say art quickly, have the option to adjust where (other than Musicbrainz falling back to Discogs) data is retrieved from.
I can see the sense in requesting all data from all remote sites - how is the user meant to know which sites has data for that artist. However I was concerned that it meant extra network traffic, extra load on source sites and slower scraping of unwanted data. @
ronie pointed out that none of that was true, the XML scraper also makes requets to all sites, the only slowness is caused by the Musicbranz throttling (I request per second), so fetching all the data we can while we are requesting makes sense.
However I think the user does need a way to avoid certain data and/or from certain sites. For example e.g. the artist style, mood and genre data can be an utter mess, you may not want it at all or only from one site that has better quality values (in your subjective view) , or the discographies returned for classical composers can be pretty useless too.
Skipping requests to unwnated sites could also reduce network traffic and load on those sites, maybe even make scraping a little quicker across thousands of artists.
Merging Scraped Artist Data with what already have
Of course a totally different approach would be to let the scraper return everything everytime, and place control over how the new scraped data is merged with the previously scraped data into core or some Kodi settings (rather than addon settings). Would be good thing for consistency, or just confusing for users to have the settings for that somewhere separate from the scraper?
Current approach with merging artist is, with exception of mbid and name that as identifiers are treated differently, the new values replace the old values. If empty then value gets cleared, sometimes users will use refresh with fetching data for some fields turned off to clean out garbage they accidentally scraped before - why did I get genres, yuck! But to achieve that you need a scraper option not to fetch some data values - perhaps put "none" in the preferrd source list?