Thread Rating:
  • 0 Vote(s) - 0 Average
The logic and future of Music scrapers?
#31
@ronie i wonder about the semantics for nfo files. i do not think it's wise to pass the whole thing as a parameter which is what would happen if i kept current logic - you potentially run out of argv space rather quickly.

as i see it we have two choices;
1) core resolves the path to the nfo file and this is passed to the add-on
or
2) we leave it up to the scraper to identify the nfo file and only pass entity path (problematic for artists).
Reply
#32
EDITED as I answered by own questions, so came up with more.

Q: Do we use album artist mbid(s) when doing an album search at Musicbrainz if we have them but not the album mbid?

A: No.
But would Musicbrainz search support this?

Q: Do we use artist names individually when the album is a collaboration e.g. multiple album artist names, or just the album artist description string (which may not have the names in the same order or syntax)? For example:
"Orchestral works" "Georg Friedrich Händel; The English Concert, Trevor Pinnock"
"Riding with the King" " Eric Clapton & B. B. King"

A: we use the album artist description string.
But would Musicbrainz search support using individual names, for better accuracy?
Reply
#33
(2017-02-15, 16:12)DaveBlake Wrote:
Quote:while testing things i found out there is a very popular skin helper addon in our repo that is also making a lot of calls to musicbrainz
when you are scraping your music collection.
if you have this addon installed (it's a dependency of many skins) you're very likely to end up with many items that failed to scrape (due to throttling), as the addon is basically doubling up the number of calls to musicbrainz.
Oh dear! What is it fetching? Does it do it even when we have mbid from tags? Could we look at optimising that too?

i don't know. when an annoying addon gets in my way, i simply delete it from my system :-)
will notify the addon dev about the issue though.

(2017-02-15, 16:12)DaveBlake Wrote: Not sure why you pointed at the code that you did, and think you may have misread? The code is about what we do with the data we have scraped, (correctly) managing what data derrived from tags gets overwritten.
When calling the scraper here https://github.com/xbmc/xbmc/blob/99c25f....cpp#L1097
and https://github.com/xbmc/xbmc/blob/99c25f....cpp#L1319
we use the mbid from tags when we have it.

my comment isn't about mbid's from your tags, but about the artist mbid that the album scraper will return.
currently it gets discarded, unless you enable 'prefer online info'.

the two code blocks i'm referring to prevent the artist mbid from the album scraper to be passed along to the artist scraper.
Do not PM or e-mail Team-Kodi members directly asking for support.
Always read the Forum rules, Kodi online-manual, FAQ, Help and Search the forum before posting.
Reply
#34
(2017-02-15, 16:58)DaveBlake Wrote: Q: Do we use album artist mbid(s) when doing an album search at Musicbrainz if we have them but not the album mbid?

A: No.
But would Musicbrainz search support this?

yup, musicbrainz search supports all possible combinations:
- artist name + album name
- artist name + album mbid
- artist mbid + album name
- artist mbid+ album mbid

hence my previous suggestion that kodi should pass ALL available info (both names and mbid's) and let the scraper figure it out.

(2017-02-15, 16:58)DaveBlake Wrote: Q: Do we use artist names individually when the album is a collaboration e.g. multiple album artist names, or just the album artist description string (which may not have the names in the same order or syntax)? For example:
"Orchestral works" "Georg Friedrich Händel; The English Concert, Trevor Pinnock"
"Riding with the King" " Eric Clapton & B. B. King"

A: we use the album artist description string.
But would Musicbrainz search support using individual names, for better accuracy?

searching for "Riding with the King" + "Eric Clapton & B. B. King":
http://musicbrainz.org/ws/2/release-grou...0B.%20King

searching for "Riding with the King" + "Eric Clapton":
http://musicbrainz.org/ws/2/release-grou...%20Clapton

searching for "Riding with the King" + "B. B. King":
http://musicbrainz.org/ws/2/release-grou...0B.%20King

the first two provide a correct match, the last one doesn't.
Do not PM or e-mail Team-Kodi members directly asking for support.
Always read the Forum rules, Kodi online-manual, FAQ, Help and Search the forum before posting.
Reply
#35
Found that niggle over automatic how scraping works!
I actually don’t like the the way that automatic scraping happens as we add each album.

Scraping albums and artists after the tag processing for all the song files in the library has been done, rather than as each album as it is added, would produce more accurate results.

Music is often tagged to a mixed standard - some albums have mbid tags, some don't. Once all the musc files have been scanned, the tag processing will have set the mbid for an artist even if it was only there on one song. That may not be on the first album by those artists, and so automatic scraping as it is would scan the artists using name alone, fetch details etc. possibly the wrong ones, only for tag scanning the next album to provide the mbid. Result mbid held along with wrong artist data.

Doing the scraping after all the scanning would also mean that we could use the artist mbids on an album search even if that album’s song files didn’t have any mbid tags. Again better accuracy.

Speed is also an issue. Get all the tags scanned and all the artists, albums and songs into the library frst and useable, then take time polling the servers for the additional information. Could even repeat after server timeouts
Reply
#36
(2017-02-15, 16:55)ironic_monkey Wrote: @ronie i wonder about the semantics for nfo files. i do not think it's wise to pass the whole thing as a parameter which is what would happen if i kept current logic - you potentially run out of argv space rather quickly.

as i see it we have two choices;
1) core resolves the path to the nfo file and this is passed to the add-on
or
2) we leave it up to the scraper to identify the nfo file and only pass entity path (problematic for artists).

perhaps i don't understand how it is supposed to work to begin with...
i assumed kodi would extract the url from the .nfo file and pass it to the scraper using the getdetails 'url' argv
the scraper would then only fetch metadata from this specific service.
Do not PM or e-mail Team-Kodi members directly asking for support.
Always read the Forum rules, Kodi online-manual, FAQ, Help and Search the forum before posting.
Reply
#37
(2017-02-15, 17:28)ronie Wrote:
(2017-02-15, 16:12)DaveBlake Wrote: Not sure why you pointed at the code that you did, and think you may have misread? The code is about what we do with the data we have scraped, (correctly) managing what data derrived from tags gets overwritten.
When calling the scraper here https://github.com/xbmc/xbmc/blob/99c25f....cpp#L1097
and https://github.com/xbmc/xbmc/blob/99c25f....cpp#L1319
we use the mbid from tags when we have it.

my comment isn't about mbid's from your tags, but about the artist mbid that the album scraper will return.
currently it gets discarded, unless you enable 'prefer online info'.

the two code blocks i'm referring to prevent the artist mbid from the album scraper to be passed along to the artist scraper.
Ah I see.
In that case #1, mixes into my thoughts on #3: I will definitely have a look at saving the mbids we get via scraping, been on my job list for ages.

But also what I just said about automatic scraping niggle above has relevence too. If we have a mbid somewhere in our song files I want to use that in preference to what we may get back using just names by scraping part way through scanning.
Reply
#38
@spiff can we get rid of the harcoded (in core) musicbrainz time-out for python based scrapers?
https://github.com/xbmc/xbmc/blob/e8a246....cpp#L1482

core does not know if the python scraper will be querying musicbrainz or not.
i'm handling the necessary timeouts in the scraper itself.
Do not PM or e-mail Team-Kodi members directly asking for support.
Always read the Forum rules, Kodi online-manual, FAQ, Help and Search the forum before posting.
Reply
#39
I don't quite understand the context but
(2017-02-15, 16:55)ironic_monkey Wrote: 2) we leave it up to the scraper to identify the nfo file and only pass entity path (problematic for artists).

Yes, artists often do not have a unique path or common folder in a music collection. Kodi already gets in a mess with art and full NFO files by trying to assume these is always one. If you have many artists even creating fake folders, no music inside, would be unmanageable.

Relationship of artist to path is many to many.
Reply
#40
it's the scraper doing the extraction ronie, kodi core has no idea which urls are supported, that's up to the scraper to decide. we simply passed the *contents* of the nfo file through the relevant scraper function. hence the problem with mapping to the current API, because entry point is handed the nfo contents.

currently the artist path is taken as the deepest common path for all songs by an artist. it works well for an artist/album type directory layout, but it won't work very well in general. but that's what my music collection had so that's what i wrote the code to do Wink
Reply
#41
(2017-02-16, 11:29)ironic_monkey Wrote: currently the artist path is taken as the deepest common path for all songs by an artist. it works well for an artist/album type directory layout, but it won't work very well in general. but that's what my music collection had so that's what i wrote the code to do Wink
So it was you Smile

An artist/album type directory layout is very common, of course the flaw is collaborative albums where there are multiple album artists e.g. "Orchestral works" by "Georg Friedrich Händel; The English Concert, Trevor Pinnock" or "Riding with the King" by " Eric Clapton & B. B. King". You might have a folder for composers e.g. Händel, but The English Concert and Trevor Pinnock will be album artists for albums by other composers. Similarly say you have other albums by Eric Clapton, or B. B. King, wherever you put "Riding with the King" it will screw up the artist art and NFO.

Then there is the question of how to load NFO and art for artists that only feature on songs but don't have an album (or folder), and form v17 where can the composers, producers, lyicists, musicians etc. that Kodi now also has in the artist table, put their NFO file?

As a solution I am considering extending NFO processing so it was more like a general import and can handle many artists in one NFO. Also when loading actually check the artist name and mbid (if it exists) match. It would mean that you can't change artist name (or identity) via NFO, but that could be a good thing. Then the user could set where their multi-artist NFO files are held, in addition to those sprinkled among their music folders. What do you think @spiff?

What I don't know yet is how we check that the deepest common path for all songs by an artist algorithm has failed to find a folder unique to just that artist. The problems all start when the algorithm results in same folder for more than one artist. Store an artist path relationship in the db?
Reply
#42
yup i know about the flaws but i don't do compilation albums, they are evil ;P

the thing you are planning is actually sort of already available through the library export / import functionality. some refinement around this and you have your stuff. i do not think it's worth complicating it beyond that, no need to store such a path in db imo.
Reply
#43
(2017-02-15, 18:00)DaveBlake Wrote: Found that niggle over automatic how scraping works!
I actually don’t like the the way that automatic scraping happens as we add each album.

Scraping albums and artists after the tag processing for all the song files in the library has been done, rather than as each album as it is added, would produce more accurate results.

Music is often tagged to a mixed standard - some albums have mbid tags, some don't. Once all the musc files have been scanned, the tag processing will have set the mbid for an artist even if it was only there on one song. That may not be on the first album by those artists, and so automatic scraping as it is would scan the artists using name alone, fetch details etc. possibly the wrong ones, only for tag scanning the next album to provide the mbid. Result mbid held along with wrong artist data.

Doing the scraping after all the scanning would also mean that we could use the artist mbids on an album search even if that album’s song files didn’t have any mbid tags. Again better accuracy.

Speed is also an issue. Get all the tags scanned and all the artists, albums and songs into the library frst and useable, then take time polling the servers for the additional information. Could even repeat after server timeouts

yup, sounds like a plan. the more detailed info we have on an artist, the better.

for albums, it might also speed up scraping if we pass all albums by an artist to the scraper as a batch, instead of passing them one by one.
Do not PM or e-mail Team-Kodi members directly asking for support.
Always read the Forum rules, Kodi online-manual, FAQ, Help and Search the forum before posting.
Reply
#44
@spiff thanx for the addtional info on .nfo files.
i didn't realize the scraper had to parse the .nfo files as well.

(2017-02-15, 16:55)ironic_monkey Wrote: @ronie i wonder about the semantics for nfo files. i do not think it's wise to pass the whole thing as a parameter which is what would happen if i kept current logic - you potentially run out of argv space rather quickly.

as i see it we have two choices;
1) core resolves the path to the nfo file and this is passed to the add-on
or
2) we leave it up to the scraper to identify the nfo file and only pass entity path (problematic for artists).

well i think i would prefer 1, that sounds like the most easiest way to me.
Do not PM or e-mail Team-Kodi members directly asking for support.
Always read the Forum rules, Kodi online-manual, FAQ, Help and Search the forum before posting.
Reply
#45
Hi ronie and others,

As briefly discussed about on Github, I hereby share my ideas for the scrapers.

Currently it's as following:

1. Kodi scraper handles are basic information, including reading tags, folder structure and a few properties from online sources.
2. All kind of addons provide additional information, e.g. artwork or additional metadata. Addons like the cdart manager and my own skinhelper scripts.

Both kodi scraper and addons are utilising online sources in an inefficient way, meaning the online sources are scraped even if the info is already been scraped once.

The above statement is true not only for the music library but also for the video database with the difference that the videodatabase has support to scrape the additional artwork directly into the artwork table while the music db internally supports this, but isn't exposed in the json API.
Endresult: Kodi database providing basic info (probably enough for most users btw) and addons providing additional info in window properties, files or whatsoever. So basically there are now 2 (or more) sources where metadata of files can be stored.

I hope you get the point I'm making here and off course yes I know I am one of the devs that created the confusion by creating the skin helper addons.

This is what I have in mind for the future:

1. Make the Kodi scraper-engine the default for all scraping actions (metadata retrieval).
2. Make sure it supports multiple "modules" / sources. For example a basic scraper which is enabled by default and some additional scrapers like this:
- grab artwork from local directories and write it to the art table
- grab artwork from online sources ike fan art.tv
- grab additional metadata e.g. ratings from last.fm etc.
- etc.
3. A user can enable some of the additional scrapers if they actually want that metadata to be scraped to the database.
4. All metadata is available in the kodi database and available as Listitem properties, meaning usable for all skins and scenarios, no ugly window properties needed which not only overcomplicates stuff but also needs system resources to monitor listitems in the background.
5. The kodi default scraper is responsible for retrieving the correct ID's such as musicbrainz ID's, IMDB etc. No there addons should have to replicate this logic as it will only cause confusion.

So basically what I'm suggesting is the possibility to have multiple "scraper-modules" for each scenario. For example a user can activate the "animated artwork" module for movies besides the default scraper. These scraper-modules will be special python modules (or C++) which can be created just the same as other kodi addons.

To achieve this level of flexibility I think there needs to be a database table (and internal logic) in the same way as how the art table works.
I can write any key/value in the art table and its accessible as listitem property.
Maybe have the core info for a media item as separate as named fields in the database structure and have some additional table accepting key/value strings.

I think this approach looks a bit like like the same direction that Montelesse is taking with enhancing the scraper to support different sources other than the filesystem. What I'm suggesting is that you have the default scraper(module) which identifies the files into a media item. That's the basic stuff, atm this is done by scanning files on the filesystem and maybe in the future this can ba done on other ways to with montelesse's work.

When the basic info is there, the additional/optional scrapers can provide additional metadata the user wants to have, such as ratings from other sources, additional artwork etc. etc.

I understand this is a pretty big project and needs a lot of thought and work. If you feel like this is some way forward, I will help out where I can.

For now, I have rewritten my skin helper addons (as discussed on Github with several team members) because the old one caused some issues.

As a first attempt to create something universal I created the new metadata module in which I placed alle scraping logic and caching. If for example 2 addons need metadatadata from TVDB, instead of grabbing the info theirselves, they can use the module and retrieve the (cached) result.

This metadata module is a very rough first step maybe to use as "optional/enhanced" scraper ?

So, to get back on-topic. I like what is discussed here to give the music scraping some more thought and I can help out where I can, just ask. As stated in the above, I think this might even be elevated into a higher level to enhance all scraping logic with default and optional modules.
Reply



The logic and future of Music scrapers?00