Kodi Community Forum
Efficient music artist scraping - Printable Version

+- Kodi Community Forum (https://forum.kodi.tv)
+-- Forum: Development (https://forum.kodi.tv/forumdisplay.php?fid=32)
+--- Forum: Kodi Application (https://forum.kodi.tv/forumdisplay.php?fid=93)
+--- Thread: Efficient music artist scraping (/showthread.php?tid=264719)

Pages: 1 2 3 4


Efficient music artist scraping - DaveBlake - 2016-03-17

For more efficient artist scraping I think we need to be more selective over which artists we search the online databases for. There has been previous discussion over better targeting, selecting the appropriate online database for the kind of music etc., but this is more fundamental.

Doing some testing with music library containing a single album I noticed that scraping artist info was only returning 3 out of 10 artists. They all had mbid and were in TADB, but any scraping session would result in lots of 503 errors and just a random 3 artists being found. It dawned on me that scraper resources are finite, and Kodi demanding info for artists would be best to focus on the artists users want info for the most.

Currently the user can either manually scrape one artist at a time, manually query information for all artists, or automatically query all (new) artists when the library is updated. All artists really is all, so even if the user has albumartistonly flag set and never sees those artists that do not have albums, the scraper will have attempted to get metadata for them - what a waste of time, and a lot of unnecessary online queries.

The situation is even worse without mbids, the song artists will probably include combined names e.g. artist1 feat. artist2, attempting to find them online is also futile.

At very least if the albumartistonly flag is set then only album artists should be scraped?

More radically how about letting the user select what artists get scraped using smart playlist rules?

From video the current idea is that scraping is performed for a media source e.g. this folder contains all my movies, this my TV series etc. With music a lot of users will have one source e.g. this folder contains my music. Beneath that they may have subfolders for family member, genre, artist, album etc. but no matter the hierarchy not every artist will have a folder and some may have more than one. For example albums can have more than one album artist - composer, conductor, orchestra, some artists do not have any albums but just appear on other peoples, and the same artist could appear in both my muisic collection and my wife's. For scraping Kodi attempts to determine a path to associate with each artist, it takes the lowest common denominator of the paths for all the songs for albums by that artist. If the artist is an album artist with all their music under one folder then this is fine. But things go wrong when that is not the case, and the common folder can end up as root.

There is no guaranteed direct relationship between artist and a folder, so why use path, and only path, to determine what artists are scraped and what scraper settings to use? Sometimes path is appropriate, but it is often not definative.

Remember with music, unlike video, we are not scraping files to identify recordings and build a video library, we are scraping the music library contents. The music library has already been populated with artists and albums using the music file tags.

I want to work on this and make scraping more resource efficient, rather than batter the data servers with unnecessary resuests, but I don't want to waste my time if others disagree with my ideas. So opinions please, oOr perhaps you see issues, like to hear those too.


RE: Efficient music artist scraping - Powerhouse - 2016-03-17

Dave, I really appreciate all that you are doing regarding Music, really, great job, and keep it up.

My personal experience with Kodi, is to NOT use the built in scrapers at all. There are far too many times that they have stopped working (for example the TADB is currently Broken in Kodi 15.2 see this thread http://forum.kodi.tv/showthread.php?tid=260530).

Instead, I use programs like MediaElch for scraping Movies, TV Shows, and now Music. That way it's one and done, and that info will always be on my Server under the correct folder.

One of the other issues I have with using Kodi's built in scrapers, is that you have to do this for every install of Kodi you use. That is a tons of Network traffic (probably one of the reasons for TADB being Broken). Again, using a scraper like MediaElch puts all the information within each and every folder for all your Movies, TV Shows and Music. Which means when you go to add a new Kodi device, it will pull in all that information straight from My Server.


RE: Efficient music artist scraping - DaveBlake - 2016-03-17

All good points Powerhouse, thank you. Music improvements not going as fast as I would like, but I'll keep at it.

Kodi is certainly generating unnecessary network traffic repeating queries. I guess you could export and import info between installations, but I know this process is flawed.

One question: what does MediaElch do about artist info for those that don't have an obvious unique folder in your music collection? Loading locally held data is also something I think needs attention. It works, you clearly use it, but as I understand it there are weaknesses e.g. handling mutiple CD sets with varied info,


RE: Efficient music artist scraping - Powerhouse - 2016-03-17

So far, my experience using MediaElch is just with Artists, and their Albums. Since it uses the information from MusicBrainz (and several other sites for Artwork), these work great. But since lots of my compilations are not true compilations that MusicBrainz understands (Billboard's top 100 from 1952, 1953...2015) I leave these out of MediaElch scanning. Don't get me wrong, there are compilations in MusicBrainz (and these work just like a normal Artist), but for custom compilations, your best bet is to create your own .NFO files and downloaded artwork (which will show up in MediaElch, once you put this information in the folder.

So the rule with MediaElch is, if it's in MusicBrainz, you will get the info (Artist.nfo, Album.nfo, posters, fanart, etc.).


RE: Efficient music artist scraping - Powerhouse - 2016-03-17

I should also mention, you don't have to have your music Tagged with MusicBrainz to use MediaElch (I have been going through my library removing all tags, and using MusicBrianz Picard to re-tag in ID version 2.4. This has been a slow process for me, but I'm a quarter of the way there).

MediaElch, will search MusicBrainz for a match to your Album/Artist, so even if you don't have them tagged correctly, you can still choose the correct Album/Artist (at least according to MusicBrainz). The problem with this is (and if you've used MusicBrainz Picard at all), you realize there may be several versions of your Album that contain extra tracks that you may not have (a Japan version of an Album that includes 2 extra tracks, yet you have the European version, that has 1 extra track, but is different from the Japan extra tracks).

This is the reason I'm going through all my music using MusicBrainz Picard. I want everything to be as close to perfect in Kodi as possible. Which is also why I want you to continue your work Dave... 8)


RE: Efficient music artist scraping - DaveBlake - 2016-03-17

So you have not discovered how say an album with multiple album artists is handled by them? For example "Riding with the king" by B.B.King and Eric Clapton, or any classical music album with composer, orchestra and conductor. Even if you keep your music in a Artist>Album folder structure there will be artists with no folder. Extra folders I guess, one for every artist, to contain the NFO file even if there are no music files stored there? Or more than one artist in each artist.nfo file?

Just curious, but I don't really want to turn this into a MediaElch thread!!!


RE: Efficient music artist scraping - Powerhouse - 2016-03-17

Actually, I will have to check that out, and report back to you. I know for Music Videos, I create the .NFo, and for ones with Artists 1 feat. Artist 2, I create a separate Artist field for each of them in the .NFO. But I forget how this works in MediaElch, so I will report back once I get off of work.


RE: Efficient music artist scraping - Powerhouse - 2016-03-17

I do remember the last time I removed and re-added Music into Kodi, that there were several Artists listed, that had a single song (when you clicked on them) from someone else's album. Before you started working on Music, Kodi (then XBMC) would create empty folders (named the Artists name) in my Music folder (on my HDD), that would contain Artist.nfo, and usually a Poster of them. This was really frustrating, as I would have hundreds of Artists that had no Music in their folder (and various family members would tell me this and ask why, to which I had no response).


RE: Efficient music artist scraping - black_eagle - 2016-03-17

(2016-03-17, 20:02)DaveBlake Wrote: So you have not discovered how say an album with multiple album artists is handled by them? For example "Riding with the king" by B.B.King and Eric Clapton, or any classical music album with composer, orchestra and conductor. Even if you keep your music in a Artist>Album folder structure there will be artists with no folder. Extra folders I guess, one for every artist, to contain the NFO file even if there are no music files stored there? Or more than one artist in each artist.nfo file?

Just curious, but I don't really want to turn this into a MediaElch thread!!!

Not very well is the answer !! Although MediaElch is great for video, music support is a relatively recent addition and checking with "Riding with the King", although the review is correct, the MBID is B.B. King, as is the artist. ME also has problems with my Now! collection. I know full well that they have MBID's because I've tagged them all with Picard, but ME doesn't identify a single one.

Part of the problem is as you described earlier - that scraping tends to depend upon a particular directory hierarchy (this appears to be how ME approaches things) but that it is perfectly possible to have artists within the music library that have no associated directory. Probably hundreds in mine !!

As a start, I agree that with albumartistsonly set, that only album artists should be scraped. The GUI already allows for obtaining information for a single artist should a user require information on a non-scraped artist. Not only would this greatly reduce the load on metadata sites, it would also make scraping a library much faster.

I'm curious about your suggestion of smart playlist rules to decide which artists get scraped - how do you envisage this working ?


RE: Efficient music artist scraping - DaveBlake - 2016-03-17

Quote:I'm curious about your suggestion of smart playlist rules to decide which artists get scraped - how do you envisage this working ?

It leads from the simple fact that the artist scraper just works through a list of artists (mbid, or just name) from the library. So to reduce network traffic etc. rather than scrape all or one artist, it is just a matter of giving it a smaller selected list of artists. That leads to how do we choose those artists? There is no reason why we can't use all the selection methods available to us e.g. playlist rules.

Simply create an artists playlist and have a side blade option (popup menu would be easier, but any thing "non-contextual" is being stripped from that) to scrape extra info for these artists. Not one or all but these. The rule could be path based, or genre, or are they an album artist, or date last scraped etc., or any rule combination.

In practice the average user wants Kodi to do it all automatically, but some users would make use of such a flexible facility. Behind the scenes Kodi could also use rules to decide what to scrape.

My innovation is to stop tying artist scraping solely to media source. In music we scrape library entries - artists, albums - the path of their related song files may make a good selection rule, but it may not. If you only a few music media sources (I only have 1) then source path is not a good filter.

The other suggestions I have been given for reducing what we scrape are also useful - keeping the data between installs, only scraping the new things (we partly do that but could do better). Our current approach is breaking the suppliers of the wonderful data, we need to do something.


RE: Efficient music artist scraping - Powerhouse - 2016-03-18

(2016-03-17, 20:33)black_eagle Wrote:
(2016-03-17, 20:02)DaveBlake Wrote: So you have not discovered how say an album with multiple album artists is handled by them? For example "Riding with the king" by B.B.King and Eric Clapton, or any classical music album with composer, orchestra and conductor. Even if you keep your music in a Artist>Album folder structure there will be artists with no folder. Extra folders I guess, one for every artist, to contain the NFO file even if there are no music files stored there? Or more than one artist in each artist.nfo file?

Just curious, but I don't really want to turn this into a MediaElch thread!!!

Not very well is the answer !! Although MediaElch is great for video, music support is a relatively recent addition and checking with "Riding with the King", although the review is correct, the MBID is B.B. King, as is the artist. ME also has problems with my Now! collection. I know full well that they have MBID's because I've tagged them all with Picard, but ME doesn't identify a single one.

Part of the problem is as you described earlier - that scraping tends to depend upon a particular directory hierarchy (this appears to be how ME approaches things) but that it is perfectly possible to have artists within the music library that have no associated directory. Probably hundreds in mine !!

As a start, I agree that with albumartistsonly set, that only album artists should be scraped. The GUI already allows for obtaining information for a single artist should a user require information on a non-scraped artist. Not only would this greatly reduce the load on metadata sites, it would also make scraping a library much faster.

I'm curious about your suggestion of smart playlist rules to decide which artists get scraped - how do you envisage this working ?

So I was wrong, MediaElch does not use MusicBrainz, instead it uses...

"There is only one scraper for music artists and albums: Universal Music Scraper. This scraper combines The Audio DB, AllMusic and Discogs for information about artists and albums. In the settings you can select the language (just used for The Audio DB) and which one you prefer. All images are scraped from Fanart.tv. Extra fanart images are also downloaded automatically. How much images should be loaded can be adjusted in the settings."

See this thread from Komet...http://community.kvibes.de/topic/show/music-scraping


RE: Efficient music artist scraping - Powerhouse - 2016-03-18

@black_eagle

I'm guessing by Now! you mean Now That's what I call Music. Doing a search gives me the following (based on what Komet has in MediaElch)...

On MusicBrainz I can find Now That's what I call Music 1980 (https://musicbrainz.org/release-group/8e97b13c-559e-3b66-8513-e001aed6c029)
On The AudioDB (site doesn't appear to be able to load at the moment).
On AllMusic I can find Now That's what I call Music 1980 (http://www.allmusic.com/album/now-thats-what-i-call-music%21-1980-mw0001725415)
On Discogs I can find Now That's what I call Music 1980 (https://www.discogs.com/Various-Now-Thats-What-I-Call-Music-1980-The-Millennium-Series/release/2572969)

I believe if you are using MediaElch, and you search for a Now! compilation, you need to remove the Exclamation Mark (!). Even though when the search in MediaElch is done, it might even display Now! (with the Exclamation Mark). There are lots of Movies and TV Shows that have this issues as well, I've just learned to remove special characters from searches in MediaElch.


RE: Efficient music artist scraping - DaveBlake - 2016-03-18

Enough of MediaElch, get your own thread Smile

For me it is just more evidence that trying to tie artists to a folder does not work well.

Fundamentally the way to make scraping more efficient - quicker for the user, less load on the metadata sites - is to scrape less. Be descriminating over what is scraped - don't repeat data requests, only request what we are going to use, locally store what we get for use between versions.

Since even people with large music collections (like @zag and @martijn) often have only a few music sources entered into Kodi e.g. we just add the main path under which all our music is stored, using source as the way to select artists to scrape is not very descriminating. Maybe if separate sources did something users might separate them more, but really a user just wants to say "all my music is over here..." not faff adding separate folders.

At the moment storing the online scraped data outside the library (as NFO files) is optional, the user has to export it. Currently the data is fetched from the metadata site and just written to the library. If the database is dropped, or the music files are moved about causing rescanning and library updates, the information is lost. Could we automate the export of this data, or at least make it more obvious task? Would it be feasible to store info when it is scraped both inside the library and in a file(s) outside it? Or should we be looking at ways to avoid the delete/insert approach Kodi uses to library update from loosing the artist and album metadata previously scrapped from online sites?


RE: Efficient music artist scraping - Tolriq - 2016-03-18

IMO avoid the remove insert would solve a lot more things than just that Wink

But one of the major problem is as you said the way of handling NFO for artists currently Sad

I do not think exporting the data automatically is always wanted, it will export tons of nfo and images to user folder that can impact other software.

What would be cool is a way to have all those export in another folder with a specific architecture, a little like subtitles you can download them in the movie folder or in another folder.


RE: Efficient music artist scraping - DaveBlake - 2016-03-18

(2016-03-18, 11:00)Tolriq Wrote: What would be cool is a way to have all those export in another folder with a specific architecture, a little like subtitles you can download them in the movie folder or in another folder.

Yes, that was the kind of thing I had in mind. Don't want to fill up the userdata space, nor jumble extra files in with the music files as NFO are currently, but let the user specify a location.