The logic and future of Music scrapers?
#15
@ronie thanks for sharing the knowledge you gathered. A few points back (as a heavy music user).
(2017-02-13, 03:04)ronie Wrote: even though your music does not have those [Musicbrainz ID] tags, i would say this will result in pretty accurate results.
the combination of artistname + albumname is unique enough to return the correct info as far as i know.
and since the album scraper returns the artist mbid, you can get accurate results from the artist scraper as well
The uniqueness of artistname + albumname really does depend on what music you have in your collection. Classical music, where there are multiple album artists (composer, conductor, orchestra) and many albums with the same title e.g. "Symphony No. 5" is particularly bad. Also use of artist and albumartist tags varies - composer in album artist or just as (song) artist etc. The scraper happily returns completely the wrong album, and if "Prefer online info" is enabled it will wipe out the accurate artist names derrived from tags.

Also Musicbrainz has multiple releases of the same album (artist and album title combination), some with different release dates and bonus tracks etc. Scraping by artistname + albumname just takes the first, which may be inaccurate.

With automated scraping of music without mbid tags we just do our best, but it is falable and can mess up a perfectly accurate library. Scraping album first (with artistname + albumname so better chance of accuracy) to get artist mbid is a good thing, but just because the scraper found something does not mean it is right.

Quote:2) hammering servers
...
In case your music IS tagged with musicbrainz id's, the artist and album scraper could skip the initial call to musicbrainz.
that is the only optimization i can think of.
What data can the Musicbrainz API return? Do we get more than just mbid?
I think part of the original design was that re-scraping could update info that had changed e.g. someone in the community had added or edited data. So if we are getting anything else from MB then we still need to scrape even when we have a mbid.

Quote:solution: run our own musicbrainz mirror.

solution: kodi should chip in to keep the metadata sites, we use and depend upon, alive.
+1

But I think we can do some other things to reduce how often we scrape too. Even running mirrors we will soon have the problem that the existing servers do.

We can also stop fetching the track lists for an album, users just don't care, all they want is the songs they have, although I doubt that will help much.

Quote:3) bugs
i came across another one and added it to the first post... perhaps it's a not bug, i don't know.

for the issues i've reported so far, i have identified to code parts and added them to the first post as well.

i would really like to get bug #2 fixed (not scraping artist info, if album lookup fails).
i've patched the code locally and didn't experience any crash-boom-bangs if we simply proceed with an artist lookup in that particular case.
I'd like to check that out too with my less scraper friendly music. And think about what happens when we have mbid for artists but not the albums etc. Something is niggling at the back of my mind over a reason why #2 is like it is.

BTW I'm up for working on any of the code on the music side of things. Not a clue with the Python though.

Quote:4) were to go next
for automatic scraping, perhaps we can get rid of the 'first provide a list of results' then 'get details for the first item' procedure?
ideally kodi would pass all the info it has to the album scraper (artistname, artist mbid, albumname, album mbid)
and leave it up to the album scraper to return the correct metadata right away.

same for the artist scraper, if we have them, pass both the artistname and artist mbid and let the artist scraper figure it out.

for manual scraping, we need to keep things as is of course.
No so sure about this, but will give it some thought. I wonder if we need to check the artists returned by the album scrape against those we already have (from tags) before we scrape those artists, or I guess the scraper could do that.

There is also the (re-)scraping of filtered list of artists or albums from "Query Info for all". Inaccurate returns mean that I do scrape artists but don't even try albums for some kinds of music.

A new thing I would like to see on the scraper side is return of disambiguation data.

When manual scraping of a single item (from refresh on info dialog) without mbids is inconclusive we get a list of possibles but no disambiguation data to help spot which is the right one. Can we get that back from Musicbrainz? Olympia mentioned that the scraper used to do it, but stopped for some reason.

Also with artist discography we need to start saving the mbids if they are (or could be) returned. Currently we do a mess match by name when the info dialog happens, makes no sense to me at all. Although the discography for classical music is totally useless - care to guess how many albums Beethoven has on Musicbrainz?

Hope at least some of this is helpful.
Reply


Messages In This Thread
RE: The logic of Music scrapers? - by ronie - 2017-02-08, 00:14
RE: The logic of Music scrapers? - by jjd-uk - 2017-02-08, 11:23
RE: The logic of Music scrapers? - by ronie - 2017-02-13, 03:04
RE: The logic and future of Music scrapers? - by DaveBlake - 2017-02-13, 16:01
RE: The logic of Music scrapers? - by ronie - 2017-02-13, 03:12
RE: The logic of Music scrapers? - by ronie - 2017-02-13, 03:28
Logout Mark Read Team Forum Stats Members Help
The logic and future of Music scrapers?0